ra a
Equational Logic as .
Programming Language
Vaeyi: OF DY anne
Thae WALI 12
Institute of nology)
auage
byiMich
JO Donnell)
nis OOOK Provide
GESGripuOM OF the! t!
tohs; Gesiqnh) ana ifn
ifninleny
IVE 1OgIc prog
iniWwhich computauons produced! di=
are
initions, Ltk
rectiy from)
quatione
ind) Prolog}
quavOndl! pr
miming languag (Nom) thre
atalprogrammen should
description) of the
Oye
concept th
is) be
OWEnU) prog ramming)
technigues not avaliable in conventional,
Uitive logic
ge) the! power
On Of Compute
5 Hopkins Unive
from) Purdu
Son Lerghtom
EQUATIONAL LOGIC
asa
PROGRAMMING LANGUAGE
Michael J. O’Donnell
The MIT Press
Cambridge, Massachusetts
London, England
Publisher’s Note
This format is intended to reduce the cost of publishing certain works in book form
and to shorten the gap between editorial preparation and final publication.
Detailed editing and composition have been avoided by photographing the text of
this book directly from the author’s prepared copy.
Second printing, 1986
© 1985 by The Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or information
storage and retrieval) without permission in writing from the publisher.
This book was composed using the UNIX tools eqn, troff, and ms, set on the APS-
5 phototypesetter, and printed and bound in the United States of America.
Library of Congress Cataloging in Publication Data
O'Donnell, Michael J., 1952-
Equational logic as a programming language.
(MIT Press series in the foundations of computing)
Bibliography: p.
Includes index.
1. Programming languages (Electronic computers)
2. Logic, Symbolic and mathematical. 3. Equations,
Theory of. I. Title. II. Series.
QA76.7.036 1985 001.64°24 84-29507
ISBN 0-262-15028-X
To Julie and Benjamin @h P| facy
Table of Contents
Preface
1.
2.
Introduction
Using the Equation Interpreter
Under UNIX (ep and ei)
Presenting Equations to the Equation Interpreter
The Syntax of Terms (/oadsyntax)
1. Standmath: Standard Mathematical Notation -- 13
2. LISP.M: Extended LISP Notation -- 14
3. Lambda: A Lambda Calculus Notation -- 15
4. Inner Syntaxes (for the advanced user with a large problem) -- 17
Restrictions on Equations
Predefined Classes of Symbols
1. integer_numerals -- 22
2. truth_values -- 22
3. characters -- 22
4. atomic_symbols -- 22
Predefined Classes of Equations
1. Functions on atomic_symbols -- 25
2. Integer Functions -- 25
3. Character Functions -- 25
Syntactic Qualifications on Variables
Miscellaneous Examples
1. List Reversal -- 30
2. Huffman Codes -- 31
13
20
22
24
27
30
3.
4
5.
6.
7.
8.
9.
Contents
Quicksort -- 33
. Toy Theorem Prover -- 34
An Unusual Adder -- 39
Arbitrary-Precision Integer Operations -- 41
Exact Addition of Real Numbers -- 46
Polynomial Addition -- 51
The Combinator Calculus -- 54
10. Beta Reduction in the Lambda Calculus -- 55
11. Lucid -- 62
10. Errors, Failures, and Diagnostic Aids 68
1.
2
3.
4
Context-Free Syntactic Errors and Failures -- 69
. Context-Sensitive Syntactic Errors and Failures -- 69
Semantic Errors and Failures -- 70
. Producing a Lexicon to Detect Inappropriate
Uses of Symbols (e/) -- 71
Producing a Graphic Display of Equations
In Tree Form (es) -- 71
Trace Output (ez) -- 73
Miscellaneous Restrictions -- 74
11. History of the Equation Interpreter Project 75
12. Low-Level Programming Techniques 78
1.
2.
3.
A Disciplined Programming Style Based
on Constructor Functions -- 78
Simulation of LISP Conditionals -- 84
Two Approaches to Errors and Exceptional Conditions -- 87
Contents
13.
14.
15.
16.
17.
18.
4, Repairing Overlaps and Nonsequential Constructs -- 90
Use of Equations for Syntactic Manipulations
1. An Improved Notation for Context-Free Grammars -- 100
2. Terms Representing the Syntax of Terms -- 112
3. Example: Type-Checking in a Term Language -- 115
Modular Construction of Equational Definitions
High-Level Programming Techniques
1, Concurrency -- 132
2. Nondeterminism vs. Indeterminacy -- 134
3. Dataflow -- 137
4. Dynamic Programming -- 145
Implementing Efficient Data Structures
in Equational Programs
1. Lists -- 151
2. Arrays -- 157
3. Search Trees and Tables -- 161
Sequential and Parallel Equational Computations
1. Term Reduction Systems -- 177
2. Sequentiality -- 180
3. Left-Sequentiality -- 183
Crucial Algorithms and Data Structures
for Processing Equations
1. Representing Expressions -- 187
2. Pattern Matching and Sequencing -- 191
1. Bottom-Up Pattern Matching -- 194
98
124
132
151
177
187
3.
4.
2. Top-Down Pattern Matching -- 199
3. Flattened Pattern Matching -- 205
Selecting Reductions in Nonsequential
Systems of Equations -- 210
Performing a Reduction Step -- 212
19. Toward a Universal Equational Machine Language
7.
8.
- Reduction Systems -- 223
The Combinator Calculus, With Variants -- 226
Simulation of One Reduction System by Another -- 235
The Relative Powers of S-K, S-K-D, and S-K-A -- 244
The S-K Combinator Calculus Simulates All Simply
Strongly Sequential Term Reduction Systems -- 248
The S-K-D Combinator Calculus Simulates All Regular
Term Reduction Systems -- 252
The Power of the Lambda Calculus -- 256
Unsolved Problems -- 260
20. Implementation of the Equation Interpreter
1. Basic Structure of the Implementation -- 262
2. A Format for Abstract Symbolic Information -- 266
3. Syntactic Processors and Their Input/Output Forms -- 270
Bibliography
Index
Contents
220
262
277
285
Series Foreword
Theoretical computer science has now undergone several decades of develop-
ment. The “classical” topics of automata theory, formal languages, and computa-
tional complexity have become firmly established, and their importance to other
theoretical work and to practice is widely recognized. Stimulated by technological
advances, theoreticians have been rapidly expanding the areas under study, and the
time delay between theoretical progress and its practical impact has been decreas-
ing dramatically. Much publicity has been given recently to breakthroughs in
cryptography and linear programming, and steady progress is being made on pro-
gramming language semantics, computational geometry, and efficient data struc-
tures. Newer, more speculative, areas of study include relational databases, VLSI
theory, and parallel and distributed computation. As this list of topics continues
expanding, it is becoming more and more difficult to stay abreast of the progress
that is being made and increasingly important that the most significant work be
distilled and communicated in a manner that will facilitate further research and
application of this work.
By publishing comprehensive books and specialized monographs on the
theoretical aspects of computer science, the series on Foundations of Computing
provides a forum in which important research topics can be presented in their
entirety and placed in perspective for researchers, students, and practitioners alike.
This volume, by Michael J. O’Donnell, presents an elegant and powerful interpre-
tive system for programming in terms of abstract logical equations. The language
is similar to Prolog, in that it is descriptive rather than procedural, but unlike Pro-
log its semantic description allows an efficient implementation that strictly adheres
to the given semantics. The presentation provides the definition of the language,
many examples of its use, and discussion of the relevant underlying theory. It is
essential reading for anyone interested in the latest ideas about nonprocedural pro-
gramming and practical programming language semantics.
Michael R. Garey
Preface
This book describes an ongoing equational programming project that started in
1975. Principal investigators on the project are Christoph Hoffmann and Michael
O'Donnell. Paul Chew, Paul Golick, Giovanni Sacco, and Robert Strandh partici-
pated as graduate students. I am responsible for the presentation at hand, and the
opinions expressed in it, but different portions of the work described involve each of
the people listed above. I use the pronoun "we" throughout the remainder, to indi-
cate unspecified subsets of that group. Specific contributions that can be attributed
to one individual are acknowledged by name, but much of the quality of the work
is due to untraceable interactions between several people, and should be credited to
the group.
The equational programming project never had a definite pseudocommercial
goal, although we always hoped to find genuinely useful applications. Rather than
seeking a style of computing to support a particular application, we took a clean,
simple, and elegant style of computing, with particularly elementary semantics, and
asked what it is good for. As a result, we adhered very strictly to the original con-
cept of computing with equations, even when certain extensions had obvious prag-
matic value. On the other hand, we were quite willing to change the application.
Originally, we envisioned equations as formal descriptions of interpreters for other
programming languages. When we discovered that such applications led to outra-
geous overhead, but that programs defined directly by equations ran quite competi-
tively with LISP, we switched application from interpreter generation to program-
ming with equations.
We do not apologize for our fanaticism about the foundations of equational
programming, and our cavalier attitude toward applications. We believe that good
Preface
mathematics is useful, but not always for the reasons that motivated its creation
(non-Euclidean geometry is a positive example, the calculus a negative one). Also,
while recognizing the need for programming languages that support important
applications immediately, we believe that scientific progress in the principles of pro-
gramming and programming languages is impeded by too quick a reach for appli-
cations. The utility of LISP, for example, is unquestionable, but the very adjust-
ments to LISP that give it success in many applications make it a very imprecise
vehicle for understanding the utility of declarative programming. We would rather
discover that pure equational programming, as we envision it, is unsuitable for a
particular application, than to expand the concept in a way that makes it harder to
trace the conceptual underpinnings of its success or failure.
Without committing to any particular type of application, we must experiment
with a variety of applications, else our approach to programming is pure specula-
tion. For this purpose, we need an implementation. The implementation must per-
form well enough that some people can be persuaded to use it. We interpret this
constraint to mean that it must compete in speed with LISP. Parsers, program-
ming support, and the other baggage possessed by all programming languages,
must be good enough not to get in the way, but the main effort should go toward
demonstrating the feasibility of the novel aspects, rather than solving well under-
stood problems once again.
The equational programming project has achieved an implementation of an
interpreter for equational programs. The implementation runs under Berkeley
UNIX* 4.1 and 4.2, and is available from the author for experimental use. The
current distribution is not well enough supported to qualify as a reliable tool for
*UNIX is a trademark of AT&T.
Preface
important applications, but we have hopes of producing such a stronger implemen-
tation in the next few years. Sections 1 through 10 constitute a user’s manual for
the current implementation. The remainder of the text covers a variety of topics
relating to the theory supporting equational programming, the algorithmic and
organizational problems solved in its implementation, and the special characteris-
tics of equational programming that qualify it for particular applications. Some
sections discuss work in progress. The intent is to give a solid intuition for all the
identifiable aspects of the project, from its esoteric theoretical foundations in logic
to its concrete implementation as a system of programs, and its potential applica-
tions.
Various portions of the work were supported by a Purdue University XL
grant, by the National Science Foundation under grants MCS-7801812 and MCS-
8217996, and by the National Security Agency under grant 84H0006. The Purdue
University Department of Computer Sciences provided essential computing
resources for most of the implementation effort. I am grateful to Robert Strandh
and Christoph Hoffmann for critical readings of the manuscript, and to AT&T
Bell Laboratories for providing phototypesetting facilities. Typesetting was accom-
plished using the troff program under UNIX.
EQUATIONAL LOGIC
asa
PROGRAMMING LANGUAGE
1. Introduction (adapted from HO82b)
Computer scientists have spent a large amount of research effort developing the
semantics of programming languages. Although we understand how to implement
Algol-style procedural programming languages efficiently, it seems to be very
difficult to say what the programs mean. The problem may come from choosing an
implementation of a language before giving the semantics that define correctness of
the implementation. In the development of the equation interpreter, we reversed
the process by taking clean, simple, intuitive semantics, and then looking for
correct, efficient implementations.
We suggest the following scenario as a good setting for the intuitive semantic
of computation. Our scenario covers many, but not all, applications of computing
(e.g., real-time applications are not included).
A person is communicating with a machine. The person gives a sequence of
assertions followed by a question. The machine responds with an answer or by
never answering.
The problem of semantics is to define, in a rigorous and understandable way, what
it means for the machine’s response to be correct. A natural informal definition of
correctness is that any answer that the machine gives must be a logical conse-
quence of the person’s assertions, and that failure to give an answer must mean
that there is no answer that follows logically from the assertions. If the language
for giving assertions is capable of describing all the computable functions, the
undecidability of the halting problem prevents the machine from always detecting
those cases where there is no answer. In such cases, the machine never halts. The
style of semantics based on logical consequence leads most naturally to a style of
programming similar to that in the descriptive or applicative languages such as
2 1. Introduction
LISP, Lucid, Prolog, Hope, OBJ, SASL and Functional Programming languages,
although Algol-style programming may also be supported in such a way. Compu-
tations under logical-consequence semantics roughly correspond to "lazy evaluation"
of LISP [HM76, FW76].
Semantics based on logical consequence is much simpler than many other
styles of programming language semantics. In particular, the understanding of
" Jogical-consequence semantics does not require construction of particular models
through lattice theory or category theory, as do the semantic treatments based on
the work of Scott and Strachey or those in the abstract-data-types literature using
initial or final algebras. If a program is given as a set of assertions, then the logi-
cal consequences of the program are merely all those additional assertions that
must be true whenever the assertions of the program are true. More precisely, an
equation A=B is a logical consequence of a set E of equations if and only if, in
every algebraic interpretation for which every equation in E is true, A=B is also
true (see [(0’D77] Chapter 2 and Section 14 of this text for a more technical treat-
ment). There is no way to determine which one of the many models of the pro-
gram assertions was really intended by the programmer: we simply compute for
him all the information we possibly can from what we are given. For those who
prefer to think of a single model, term algebras or initial algebras may be used to
construct one model for which the true equations are precisely the logical conse-
quences of a given set of equations.
We use the language of equational logic to write the assertions of a program.
Other logical languages are available, such as the first-order predicate calculus,
used in Prolog [Ko79a]. We have chosen to emphasize the reconciliation of strict
adherence to logical consequences with good run-time performance, at the expense
1. Introduction 3
of generality of the language. Current implementations of Prolog do not always
discover all of the logical consequences of a program, and may waste much time
searching through irrelevant derivations. With our language of equations, we lose
some of the expressive power of Prolog, but we always discover all of the logical
consequences of a program, and avoid searching irrelevant ones except in cases that
inherently require parallel computation. Hoffmann and O’Donnell survey the
issues involved in computing with equations in [HO82b]. Section 17 discusses the
question of relevant vs. irrelevant consequences of equations more specifically.
Specializing our computing scenario to equational languages:
The person gives a sequence of equations followed by a question, "What is E?"
for some expression E. The machine responds with an equation "E=F," where
F is a simple expression.
For our equation interpreter, the "simple expressions" above must be the normal
forms: expressions containing no instance of a left-hand side of an equation. This
assumption allows the equations to be used as rewriting rules, directing the replace-
ment of instances of left-hand sides by the corresponding right-hand sides. Sec-
tions 2 and 3 explain how to use the equation interpreter to act out the scenario
above. Our equational computing scenario is a special case of a similar scenario
developed independently by the philosophers Belnap and Steel for a logic of ques-
tions and answers [BS76].
The equation interpreter accepts equations as input, and automatically pro-
duces a program to perform the computations described by the equations. In order
to achieve reasonable efficiency, we impose some fairly liberal restrictions on the
form of equations given. Section 5 describes these restrictions, and Sections 6-8
and 10 present features of the interpreter. Section 15 describes the computational
4 1. Introduction
power of the interpreter in terms of the procedural concepts of parallelism, non-
determinism, and pipelining.
Typical applications for which the equation interpreter should be useful are:
1. We may write quick and easy programs for the sorts of arithmetic and list-
manipulating functions that are commonly programmed in languages such as
LISP. The "lazy evaluation" implied by logical-consequence semantics allows
us to describe infinite objects in such a program, as long as only finite portions
are actually used in the output. The advantages of this capability, discussed
in (FW76, HM76], are similar to the advantages of pipelining between corou-
tines in a procedural language. Definitions of large or infinite objects may
also be used to implement a kind of automatic dynamic programming (see
Section 15.4).
2. We may define programming languages by equations, and the equation proces-
sor will produce interpreters. Thus, we may experiment with the design of a
programming language before investing the substantial effort required to pro-
duce a compiler or even a hand-coded interpreter.
3. Equations describing abstract data types may be used to produce correct
implementations automatically, as suggested by [cs78, Wa76], and imple-
mented independently in the OBJ language [FGJM85].
4. Theorems of the form A=B may sometimes be proved by receiving the same
answer to the questions "What is A?" and "What is B?" [KB70, HO88] dis-
cuss such theorem provers. REVE [Le83, FG84] is a system for developing
theorem-proving applications of equations.
1. Introduction 5
5. Non-context-free syntactic checking, and semantics, such as compiler code-
generation, may be described formally by equations and used, along with the
conventional formal parsers, to automatically produce compilers (see Section
13).
The equation interpreter is intended for use by two different classes of user, in
somewhat different styles. The first sort of user is interested in computing results
for direct human consumption, using well-established facilities. This sort of user
should stay fairly close to the paradigm presented in Section 2, should take the
syntactic descriptions as fixed descriptions of a programming language, and should
skip Section 20, as well as other sections that do not relate to the problem at hand.
The second sort of user is building a new computing product, that will itself be
used directly or indirectly to produce humanly readable results. This sort of user
will almost certainly need to modify or redesign some of the syntactic processors,
and will need to read Sections 13 and 20 rather closely in order to understand how
to combine equationally-produced interpreters with other sorts of programs. The
second sort of user is encouraged to think of the equation interpreter as a tool,
analogous to a formal parser constructor, for building whichever parts of his pro-
duct are conveniently described by equations. These equational programs may then
be combined with programs produced by other language processors to perform
those tasks not conveniently implemented by equations. The aim in using equations
should be to achieve the same sort of self-documentation and ease of modification
that may be achieved by formal grammars, in solving problems where context-free
manipulations are not sufficiently powerful.
2. Using the Equation Interpreter Under UNIX (ep and ei)
Use of the equation interpreter involves two separate steps: preprocessing and
interpreting. The preprocessing step, like a programming language compiler,
analyzes the given equations and produces machine code. The interpreting step,
which may be run any number of times once preprocessing is done, reduces a given
_term to normal form.
Normal use of the equation interpreter requires the user to create a directory
containing 4 files used by the interpreter. The 4 files to be created are:
1. definitions - containing the equations;
2. pre.in - an input parser for the preprocessor;
3. int.in - an input parser for the interpreter;
4. int.out - an output pretty-printer for the interpreter.
The file definitions, discussed in Section 3, is usually typed in literally by the user.
The files pre.in, int.in and int.out, which must be executable, are usually produced
automatically by the command loadsyntax, as discussed in Section 4.
To invoke the preprocessor, type the following command to the shell
ep Equnsdir
where Equnsdir is the directory in which you have created the 4 files above. If no
directory is given, the current directory is used. Ep will use Equnsdir as the home
for several temporary files, and produce in Equnsdir an executable file named
interpreter. Because of the creation and removal of temporary files, the user should
avoid placing any extraneous files in Equnsdir. Two of the files produced by ep
are not removed: def.deep and def.in. These files are not strictly necessary for
operation of the interpreter, and may be removed in the interest of space
2. Using the Equation Interpreter 7
conservation, but they are useful in building up complex definitions from simpler
ones (Section 14) and in producing certain diagnostic output (Section 10). To
invoke the interpreter, type the command:
ei Equnsdir
A term found on standard input will be reduced, and its normal form placed on the
standard output.
A paradigmatic session with the equation interpreter has the following form:
mkdir Equnsdir
loadsyntax Equnsdir
edit Equnsdir/definitions using your favorite editor
ep Equnsdir
edit input using your favorite editor
ei Equnsdir <input
The sophisticated user of UNIX may invoke ei from his favorite interactive editor,
such as ned or emacs, in order to be able to simultaneously manipulate the input
and output.
In more advanced applications, if several equation interpreters are run in a
pipeline, repeated invocation of the syntactic processors may be avoided by invok-
ing the interpreters directly, instead of using ei. For example, if Equ.1, Equ.2,
Equ.3 are all directories in which equational interpreters have been compiled, the
following command pipes standard input through all three interpreters:
Equ.lfint.in | Equ.1finterpreter | Equ.2sinterpreter |
Equ.3finterpreter | Equ.3/int.out;
Use of ei for the same purpose would involve 4 extra invocations of syntactic pro-
cessors, introducing wasted computation and, worse, the possibility that superficial
aspects of the syntax, such as quoting conventions, may affect the results. If
8 2. Using the Equation Interpreter
Equ.1, Equ.2, and Equ.3 are not all produced using the same syntax, careful con-
sideration of the relationship between the different syntaxes will be needed to make
sense of such a pipe.
After specifying the directory containing definitions, the user may give the size
2!5_-1=32767:
of the workspace to be used in the interpreter. This size defaults to
the largest that can be addressed in one 16-bit word with a sign bit. The
, workspace size limits the size of the largest expression occurring as an intermediate
step in any reduction of an input to normal form. The effect of the limit is blurred
somewhat by sharing of equivalent subexpressions, and by allocation of space for
declared symbols even when they do not actually take part in a particular computa-
tion. For example, to reduce the interpreter workspace to half of the default, type
ep Equnsdir 16384
The largest workspace usable in the current implementation is
2319147483646. The limiting factor is the Berkeley Pascal compiler, which
will not process a constant bigger than 231_1=2147483647, and which produces
mysteriously incorrect assembly code for an allocation of exactly that much. On
current VAX Unix implementations, the shell may often refuse to run sizes much
larger than the default because of insufficient main memory. In such a case, the
user will see a message from the shell saying "not enough core" or “too big".
3. Presenting Equations to the Equation Interpreter
Input to the equation interpreter, stored in the file definitions, must be of the fol-
lowing form:
Symbols
symbol_descriptor;
symbol_descriptor;
symbol_descriptor,,.
For all variable,, variable, --- variable,:
equation;
equation;
equationy.
The principal keywords recognized by the preprocessor are Symbols, For all, and
Equations, appearing at the beginning of a line. Equations is an alternative to
For all used in the unusual case that there are no variables in the equations. Cap-
italization of these keywords is optional, and any number of blanks greater than 0
may appear between For and all. The only other standard keywords are include,
where, end where, is, are, in, either, or, and end or. The special symbols used by
the preprocessor are ":", ";", ".",",", "", and "". Particular term syntaxes (see Sec-
tion 4) may entail other keywords and special symbols. Blanks are required only
where necessary to separate alphanumeric strings. Any line beginning with ""is a
comment, with no impact on the meaning of a specification.
symbol_descriptors indicate one or more symbols in the language to be
10 3. Presenting Equations
defined, and give their arities. Intuitively, symbols of arity 0 are the constants of
the language, and symbols of higher arity are the operators. A symbol_descriptor
is either of the form
symbol, symbol, ... symbol,,: arity m21
or of the form
include symbol class, ... symbol_class, n2\
Syntactically, symbols and symbol_classes are identifiers: strings other than key-
words beginning with an alphabetic symbol followed by any combination of alpha-
betic symbols, base-ten digits, "_", and "-". Identifiers are currently limited to 20
characters, a restriction which will be removed in future versions. A symbol_class
indicates the inclusion of a large predefined class of symbols. These classes are dis-
cussed in Section 6. Symbols that have been explicitly declared in the Symbols
section are called literal symbols, to distinguish them from members of the
predefined classes.
variables are identifiers, of the same sort as symbols. An equation is either of
the form
term, = term
of the form
term, = term, where qualification
or of the form
include equation _class, ++ ,equation_class,,
3. Presenting Equations 11
The syntax of terms is somewhat flexible, and is discussed in Section 4.
qualifications are syntactic constraints on substitutions for variables, and are dis-
cussed in Section 8. equation_classes are identifiers indicating the inclusion of a
large number of predefined equations. These classes are discussed in Section 7.
For larger problems, the notation presented in this section will surely not be
satisfactory, because it provides no formal mechanism for giving structure to a
large definition. Section 14 describes a set of operators that may be applied to one
or more equational definitions to produce useful extensions, modifications, and com-
binations of the definitions. The idea for these definition-constructing operators
comes from work on abstract data types by Burstall and Goguen, implemented in
the language OBJ [BG77]. Users are strongly encouraged to start using these
operators as soon as a definition begins to be annoyingly large. The current version
does not implement operators on definitions, so most users will not want to attack
large problems until a more advanced version is released.
The syntax presented in this section is collected in the BNF below.
<program> ::= Symbols <symbol descriptor list>.
For all <variable list >:<equation list>.
<symbol descriptor list > ::= <symbol descriptor >;...3 <symbol descriptor >
<symbol descriptor > ::= <symbol list>:<arity> |
include <symbol class list>
<symbol class list> ::= <symbol class>,..., <symbol class>
<symbol class > ::= atomic_symbols | integer_numerals | truth_yalues
12 3. Presenting Equations
<symbol list> = <symbol>,..., <symbol>
<symbol> ::= <identifier >
<arity > = <number>
<variable list> = <variable>,..., <variable>
_ <variable> ::= <identifier >
<equation list > = <equation>;... 3 <equation>
<equation> = <term> = <term> |
<term> = <term> where <qualification> end where |
include <equation class list>
<qualification> ::= <qualification item list >
<qualification item list> s:= <qualification item>,... ,<qualification item >
<qualification item> ::= <variable> is <qualified term> |
<variable list> are <qualified term>
<qualified term> ::= in<symbol class> |
<term> |
<qualified term> where <qualification> end where |
either <qualified term list> end or
<qualified term list> := <qualified term>or ... or <qualified term>
4. The Syntax of Terms (Joadsyntax)
Since no single syntax for terms is acceptable for all applications of the equation
interpreter, we provide a library of syntaxes from which the user may choose the
one best suited to his application. The more sophisticated user, who wishes to
custom-build his own syntax, should see Section 20 on implementation to learn the
requirements for parsers and pretty-printers.
To choose a syntax from the current library, type the command
loadsyntax Equnsdir Syntax
where Equnsdir is the directory containing the preprocessor input, and Syntax is
the name of the syntax to be seen by the user. Loadsyntax will create the
appropriate pre.in, int.in, and int.out files in Equnsdir to process the selected syn-
tax. If syntax is omitted, LISP.M is used by default. If Equnsdir is also omitted,
the current directory is used.
In order to distinguish atomic symbols from nullary literal symbols in input to
the interpreter, the literal symbols must be written with an empty argument list.
Thus, in Standmath notation, a(Q) is a literal symbol, and a is an atomic symbol.
In LISP.M, the corresponding notations are a[] and a. This regrettable notational
clumsiness should disappear in later versions.
4.1 Standmath: Standard Mathematical Notation
The Standmath syntax is standard mathematical functional prefix notation, with
arguments surrounded by parentheses and separated by commas, such as
S (g(a,b),c,h(e)). Empty argument lists are allowed, as in fQ. This syntax is
used as the standard of reference (but is not the default choice), all others are
described as special notations for Standmath terms.
14 4. Syntax of Terms
4.2 LISP.M: Extended LISP Notation
LISP.M is a liberal LISP notation, which mixes M-expression notation freely with
S-expressions [McC60]. Invocation of LISP.M requires declaration of the nullary
symbol nil and the binary symbol cons. An M-expression accepted by LISP.M
may be in any of the following forms:
atomic_symbol
nilO
(M-expr,; M-expr, +++ M-expr,,) m20
(M—expr, M-expr, +++ M-expr,_,. M-expr,) n>1
function(M—expr,; M—expr,;:+*M —expr,] p20
(M-expr, +++ M—expr,_, . M-expr,)
is a notation for
cons(M—expr,, +++ cons(M—expr,_), M—expr,) -*- )
(M-expr, +++ M-expr,,)
is a notation for
cons (M—expr,, cons(M—expr>, +++ cons(M—expr,, nilQ) --- ))
function|M—expr\; +++ M—expry]
is a notation for
function(M—expr,, --- M-expr,)
4.3. Lambda 15
4.3 Lambda: A Lambda Calculus Notation
Lambda notation is intended for use in experiments with evaluation strategies for
the lambda calculus. This notation supports the most common abbreviations con-
veniently, while allowing unusual sorts of expressions to be described at the cost of
less convenient notation. Because of the highly experimental nature of this syntax,
less attention has been given to providing useful error messages. Since this lambda
notation was developed to support one particular series of experiments with reduc-
tion strategies, it will probably not be suitable for all uses of the lambda calculus.
\x.E
is a notation for
Lambda(cons(x, nil0), E)
where x must be an atomic symbol representing a variable.
(E F)
is a notation for
AP(E, F)
In principle, the notations above are sufficient for describing arbitrary lambda
terms, but for convenience, multiple left-associated applications may be given with
only one parenthesis pair. Thus,
(E, E,E; “29 E,,) n22
16 4. Syntax of Terms
is a notation for
AP( +++ AP(AP(E,, Ey), Ey), *** En)
Similarly, many variables may be lambda bound by a single use of "\". Thus,
\x NX. °°° x, E n21
is a notation for
Lambda (cons (x), cons (x2, +++ nilQ) +++ ),E)
Notice that the list of variables is given as a LISP list, rather than the more con-
ventional representation as
Lambda(x,, Lambda(x», +++ Lambda(x,,E) +++ ))
It is easy to write equations to translate the listed-variable form into the more con-
ventional representation, but the listed form allows reduction strategies to take
advantage of nested Lambdas. In order to write equations manipulating lists of
variables, it is necessary to refer to a list of unknown length. So,
\xy XQ °¢* x,1rem.E n20
is a notation for
Lambda (cons (x, cons (x, +++ rem) ++: ), E)
That is, rem above represents the remainder of the list beyond x; ---x,. In the
special case where n=0,
4.3. Lambda 17
\:list.E
is a notation for
Lambda(list, E)
In order to deal with special internal forms, such as de Bruijn notation [deB72],
the form
\iE
is allowed as a notation for
Lambda(i, E)
where i is an integer numeral. If function symbols other than Lambda and AP
must be introduced, a bracketed style of function application may be used, in
which
flEy o> Eyl n>0
is a notation for
S (Ej, SA E,)
4.4 Inner Syntaxes (for the advanced user with a large problem)
Independently of the surface syntax in which terms are written, it may be helpful
to use different internal representations of terms for different purposes. For exam-
ple, instead of having a number of function symbols of different arities, it is some-
18 4. Syntax of Terms
times convenient to use only one binary symbol, AP, representing function applica-
tion, and to represent all other functions by nullary symbols. Application of a
function to multiple arguments is represented by a sequence of separate applica-
tions, one for each argument. The translation from standard notation to this appli-
cative notation is often called Currying. For example, the term
S(gla, b), h())
is Curried to
AP(AP(f, AP(AP(g, a), b), AP(h, c))).
Since the performance of the pattern-matching techniques used by the equation
interpreter is affected by the internal representation of the patterns, it may be
important to choose the best such representation in order to solve large problems.
The current version of the system is not particularly sensitive to such choices, but
earlier versions were, and later versions may again be so. In order to use an alter-
nate internal representation, type
loadsyntax Equnsdir Outersynt Innersynt
where Outersynt is one of the syntaxes described in Sections 4.1-4.3, and Innersynt
is the name of the chosen internal representation. Currently, only two internal
representations are available. Standmath is the standard mathematical notation,
so
loadsyntax Equnsdir Outersynt Standmath
4.4. Inner Syntaxes
is equivalent to
loadsyntax Equnsdir Outersynt
The other internal representation is Curry, described above.
19
5. Restrictions on Equations
In order for the reduction strategies used by the equation interpreter to be correct
according to the logical-consequence semantics, some restrictions must be placed on
the equations. The user may learn these restrictions by study, or by trial and error,
since the preprocessor gives messages about each violation. Presently, 5 restrictions
are enforced:
‘1. No variable may be repeated on the left side of an equation. For instance,
if yy =y
is prohibited, because of the 2 instances of y on the left side.
2. Every variable appearing on the right side of an equation must also appear on
the left. For instance, f(x)=y is prohibited.
3. Two different left sides may not match the same expression. So the pair of
equations
gx) =0; glx,l) =0
is prohibited, because both of them apply to g (0,1).
4. When two (not necessarily different) left-hand sides match two different parts
of the same expression, the two parts must not overlap. E.g., the pair of equa-
tions
first(pred(x)) = predfunc; pred(succ(x)) = x
is prohibited, since the left-hand sides overlap in first (pred (succ (0)).
5. It must be possible, in a left-to-right preorder traversal of any term, to iden-
tify an instance of a left-hand side without traversing any part of the term
below that instance. This property is called left-sequentiality. For example,
5. Restrictions on Equations 21
the pair of equations
Se, a), y) = 0; gb, =1
is prohibited, since after scanning f(g it is impossible to decide whether to
look at the first argument to g in hopes of matching the b in the second equa-
tion, or to skip it and try to match the first equation.
Violations of left-sequentiality may often be avoided by reordering the argu-
ments to a function. For example, the disallowed equations above could be
replaced by f(g(a,x),y) = 0 and g(c,b) = 1. Left-sequentiality does not neces-
sarily imply that leftmost-outermost evaluation will work. Rather, it means that in
attempting to create a redex at some point in a term, the evaluator can determine
whether or not to perform reductions within a leftward portion of the term without
looking at anything to the right. Left-sequentiality is discussed in more detail in
Sections 17 and 18.3.
All five of these restrictions are enforced by the preprocessor. Violations pro-
duce diagnostic messages and prevent compiling of an interpreter. The left-
sequentiality restriction (5) subsumes the nonoverlapping restriction (4), but later
versions of the system will remove the sequentiality constraint. Later versions will
also relax restriction (3) to allow compatible left-hand sides when the right-hand
sides agree.
6. Predefined Classes of Symbols
It is sometimes impossible to list in advance all of the symbols to be processed by a
particular set of equations. Therefore, we allow 4 predefined classes of symbols to
be invoked by name. These classes consist entirely of constants, that is, nullary
symbols.
- 6.1. integer_numerals
The integer_numerals include all of the sequences of base-10 digits, optionally pre-
ceded by "=". Numerals are limited to fit in a single machine word: the range
-2147483647 to +2147483647 on the current VAX implementation. Later versions
will use the operators of Section 14 to provide arbitrary precision integer arith-
metic.
6.2. truth_values
The truth_values are the symbols true and false. They are included as a
predefined class for standardization.
6.3. characters
The characters are ASCII characters, presented in single or double
quotes. The only operations available are conversions between characters
and integer numerals. Later versions will use the operators of Section 14 to pro-
vide arbitrarily long character strings, and some useful string-manipulating opera-
tions.
6.4. atomic_symbols
The atomic_symbols are structureless symbols whose only detectable relations are
equality and inequality. Every identifier different from true and false, and not
6. Predefined Classes of Symbols 23
having any arguments, is taken to be an atomic symbol. _In order to distinguish
nullary literal symbols from atomic symbols, the literal symbols are given null
strings of arguments, such as /it() (in Standmath notation) and /it{] (in LISP.M
notation). Currently, atomic_symbols are limited to lengths from 0 to 20. Later
versions will use the operators of Section 14 to provide arbitrarily long
atomic_symbols.
Section 7 describes predefined functions which operate on these classes of sym-
bols.
7. Predefined Classes of Equations
The predefined classes of equations described in this section were introduced to
provide access to selected machine instructions, particularly those for arithmetic
operations, without sacrificing the semantic simplicity of the equation interpreter,
and without introducing any new types of failure, such as arithmetic overflow.
Only those operations that are extremely common and whose implementations in
“machine instructions bring substantial performance benefits are included. The
intent is to provide a minimal set of predefined operations from which more power-
ful operations may be defined by explicitly-given equations. So, every predefined
operation described below has the same effect as a certain impractically large set of
equations, and the very desirable extensions of these sets of equations to handle
multiple-word objects are left to be done by explicitly-given equations in later ver-
sions.
For each predefined class of symbols, there are predefined classes of equations
defining standard functions for those symbols. Some of the functions produce
values in another class than the the class of the arguments. Predefined classes of
equations allow a user to specify a prohibitively large set of equations concisely,
and allow the implementation to use special, more efficient techniques to process
those equations than are used in general. When a predefined class of functions is
invoked, all of the relevant function symbols and classes of symbols must be
declared as well. We will describe the functions defined for each class of symbols.
The associated class of equations is the complete graph of the function. For exam-
ple, the integer function add has the class of equations containing add (0,0)=0,
add (0,1)=1, ... add(1,0)=1, add (1,1)=2, ... .
7.1 Functions on atomic_symbols 25
7.1. Functions on atomic_symbols
equatom equ(x,y) = true if x=y,
false otherwise
7.2. Integer Functions
multint multiply(x,y) = x * y
divint divide(x,y) = the greatest integer <x/y if y#*0
modint modulo(x,y) = x — (ysdivide(x,y)) if y #0,
x otherwise
addint add(x,y) =x +y
subint subtract(x,y) =x — y
equint equ(x,y) = true if x=y,
false otherwise
lessint less(x,y) = true ifx<y,
false otherwise
An expression starting with the function divide will not be reduced at all if the
second argument is 0. Thus, the output will give full information about the
erroneous use of this function. Similarly, additions and multiplications leading to
overflow will simply not be performed. Later versions will perform arbitrary preci-
sion arithmetic (see Section 9.6), removing this restriction.
7.3. Character Functions
equchar equ(x,y) = true if x=y,
false otherwise
26 7. Predefined Classes of Equations
intchar char(i) = the ith character in a standard ordering
charint seqno(x) = the position of x in a standard ordering
An application of char to an integer outside of the range 0 to 27 — 1 = 127, or an
application of seqno to a string of length other than 1 will not be reduced. Later
versions will use the operations of Section 14 to provide useful string-manipulating
operations for arbitrarily long character strings,
8. Syntactic Qualifications on Variables
Even with a liberal set of predefined functions, there will arise cases where the set
of equations that a user wants to include in his definition is much too large to ever
type by hand. For example, in defining a LISP interpreter, it is important to define
the function atom, which tests for atomic symbols. The natural set of equations to
define this function includes atom (cons(x,y))=false, atom(a)=true,
atom (b)=true, ... atom (aa)=true, atom (ab)=true, ..... We would like to abbre-
viate this large set of equations with the following two:
atom(cons(x,y)) = false;
atom(x) = true where x is either
in atomic_symbols
or in integer_numerals
end or
end where
Notice that the qualification placed on the variable x is essentially a syntactic,
rather than a semantic, one. In general, we allow equations of the form:
term = term where qualification end where
A qualification is of the form
WV
qualification_item,, +++ qualification_item,, m
and qualification_items are of the forms
variable is qualification_term
variable,, -++ variable, are qualification_term
and qualification_terms are of the forms
in predefined_symbol_class
28 8. Syntactic Qualifications
term
qualification_term where qualification end where
either qualification_term, or -+~ qualification_term, end or
Examples illustrating the forms above:
atompair_or_atom(x) = true
where x is either
cons(y,z) where y,z are in atomic_symbols end where
or in atomic_symbols
end where;
atom _int_pair(x) = true
where x is cons(y,z)
where y is in atomic_symbols,
x is in integer_numerals
end where
end where
If the same variable is mentioned in two different nested qualifications, the inner-
most qualification applies.
The interpretation of the restrictions on equations in Section 5 is not obvious
in the presence of qualified equations. Restrictions 1 and 2, regarding the appear-
ance of variables on left and right sides of equations, are applied to the unqualified
equations, ignoring the qualifying clauses. Restrictions 4 and 5, regarding possible
interactions between left sides, are applied to the results of substituting variable
qualifications for the instances of variables that they qualify. For example, the
equation
S(x) = y where x is g(y) end where
is prohibited, because the variable y is not present on the unqualified left side, and
the pair of equations
8. Syntactic Qualifications 29
SC) = 0 where x is g(y) end where; g(x) = 1;
is prohibited because of the overlap in f(g(a)). In general, a variable occurring in
a where clause is local to that clause, so g(x,y) =z where x is y is equivalent to
g(x,y) =z, rather than g(x,x) =z. The details of interactions between variable
bindings and where clauses certainly need more thought, but fortunately the subtle
cases do not occur very often.
9. Miscellaneous Examples
This section contains examples of complete equational programs that do not fit any
specific topic, but help give a general feeling for the capabilities of the interpreter.
The first ones are primitive, and should be accessible to every reader, but later
ones, such as the /ambda—calculus example, are intended only for the reader
whose specialized interests agree with the topic.
9.1. List Reversal
The following example, using the LISP.M syntax, is chosen for its triviality. The
operation of reversal (rev) is defined using the operation of adding an element to
the end of a list (addend). A trace of this example shows that the number of steps
to reverse a list of length n is proportional to 2. Notice that the usual LISP opera-
tors car and cdr (first element, remaining elements of a list) are not needed,
because of the ability to nest operation symbols on the left-hand sides of equations.
This example has no advantage over the corresponding LISP program, other than
transparency of notation. It is easy to imagine a compiler that would translate
equational programs of this sort into LISP in a very straightforward way.
Symbols
: List constructors
cons: 2;
nil: 0;
: Operators for list manipulation
rev: 1;
addend: 2;
include atomic_symbols.
For all x,y,z:
rev[Q] = 0;
revi(x . we = addendlrevlyJ; x];
9.1. List Reversal 31
addend[Q; x] = (x);
addend[{(x . y); z] = (x . addendly; z)).
The following equations redefine list reversal in such a way that the equation
interpreter will perform a linear-time algorithm. Just like the naive quadratic time
version above, these equations may be compiled into a LISP program in a very
straightforward way.
Symbols
cons: 2;
nil: 0;
rev: 1;
apprev. 2;
include atomic_symbols.
For all x,y,z:
revix] = apprevix; OJ;
: apprevlx; z] is the result of appending z to the reversal of x.
appreviQ); z] = 2;
apprevi(x . y); z] = apprevly; (x . z)].
9.2. Huffman Codes
The following definition of an operator producing Huffman codes [Hu52, AHU83]
as binary trees is a little bit clumsier than the list reversals above to translate into
LISP, since the operator Huff, a new constructor combining a partially-constructed
Huffman tree with its weight, would either be omitted in a representing S-
expression, or encoded as an atomic symbol. Either way, the list constructing
operator is overloaded with two different intuitive meanings, and the expressions
become a bit harder to read.
32 9. Miscellaneous Examples
The following equations produce Huffman codes in the form of binary trees
constructed with cons. To produce the Huffman tree for the keys K,,Ko,°°: Ky
_with weights w),W2,°°‘W, in decreasing numerical order, evaluate the term
BuildHuff ((Hufflwy;K,]- >> Hufflw,:K, D1.
Symbols
: List construction operators
cons: 2;
nil: 0;
+ Hufflw; t] represents the tree t (built from cons) having weight w.
Huff: 2;
: Tree building operators
BuildHufg: 1;
Insert: 2;
Combine: 2;
: Arithmetic and logical symbols and operators
add: 2;
less: 2;
include truth_values;
include integer_numerals;
include atomic_symbols.
For all weight!, weight2, treel, tree2, x, y, remainder, item:
: if is the standard conditional function, and add, less
2 are the standard arithmetic operation and test.
ifltrue; x; y] = x; iflfalse; x; yl = y;
include addint, lessint;
: BuildHuffllist] assumes that its argument is a list of weighted trees, in
: decreasing order by weight, and combines the trees into a single tree
: representing the Huffman code for the given weights.
BuildHuffl(Hufflweight!; treel])] = treel;
BuildHuffl(x y . remainder)] =
BuildHufflinsert[remainder; Combinelx; y]]]
where x, y are Hufflweight!; tree1] end where;
9.2. Huffman Codes 33
: Insertllist; tree] inserts the given weighted tree into the given list of weighted
trees according to its weight. Insert assumes that the list is in decreasing
order by weight.
Insert[0; item] = (item);
ders) He ives abies tree2] . remainder); Hufflweight1; tree1]] =
iflless pian ; weight2];
(Hufflweight1; treel] Hufflweight2; tree2] . remainder);
(Hufflweight2; tree2] . Insertlremainder; Huff[weight1; tree1]))];
: Combinelt1; t2] is the combination of the weighted trees t1 and t2 resulting
: from hanging t1 and t2 from a common root, and adding their weights.
Combines affivciencts treel]: Hufflweight2; tree2]] =
Huffladdlweight1; weight2J; (tree! . tree2)].
9.3. Quicksort
The following equational program sorts a list of integers by the Quicksort pro-
cedure [Ho62, AHU74]. This program may also be translated easily into LISP.
The fundamental idea behind Quicksort is that we may sort a list / by choosing a
value i (usually the first value in /) splitting / into the lists / <, /., and /, of ele-
ments <i, =i, and >i, respectively. Then, sort /< and /, (J. is already sorted),
and append them to get the sorted version of /. Quicksort sorts a list of n elements
in time O(nlogn) on the average.
Symbols
: List construction operators
cons: 2;
nil: 0;
: List manipulation operators
smaller, larger: 2;
append: 2;
sort: 1;
2 ea and arithmetic operators and symbols
if: 3;
34 9. Miscellaneous Examples
less: 2;
include integer_numerals, truth_values.
For all i, j, a, b, rem:
sort] = 0;
sortl(i . rem)] = appendlsort[smallerli; reml]; append[(i); sortllargerli; rem]]]]:
z smaller[i; al is a list of the elements of a smaller than or equal to the integer i.
“smallerli; 0] = 0;
smallerli; (j . rem)] = ifllessli; jJ; smallerli; rem]; (j . smaller[i; rem])]:
: largerli; al is a list of the elements of a larger than the integer i.
largerli; 0] = 0;
largerli; (j . rem)] = ifllessli; jl; G . largerli; rem); largerli; reml]-
: appendla; b] is the concatenation of the lists a and b.
append[(); a] = a;
appendl(i . rem); a] = (i . appendlrem; al);
‘if, less, and greater are the standard logical and arithmetic operations.
ifltrue; a; b] = a; iflfalse; a; b] = b;
include lessint.
9.4. A Toy Theorem Prover
The fact that sorting a list / with an equational program is equivalent to proving
sort{I] =1' based on certain assumptions, leads one to consider the similarity
between sorting by exchanges, and proving equalities based on commutativity and
associativity. Given the axioms x+y =y+x and (x+y)+z =x+(y+z), we
9.4. A Toy Theorem Prover 35
quickly learn to recognize that for any two additive expressions, Z, and E>, con-
taining the same summands possibly in a different order, EF; = E,. One way to
formalize that insight is to give a procedure that takes such E£, and E, as inputs,
checks whether they are indeed equal up to commutativity and associativity, and
produces the proof of equality if there is one. The proof of equality of two terms
by commutativity and associativity amounts to a sorting of the summands in one or
both terms, with each application of the commutativity axiom corresponding to an
interchange.
In the following program, comparel[a;b] takes two additive expressions a and
b, and produces a proof that a=b from commutativity and associativity, if such ;
proof exists. Additive expressions are constructed from numbered variables, with
vli] representing the ith variable v;, and the syntactic binary operator plus.
Proofs are represented by lists of expressions, with the convention that each expres-
sion in the list transforms into the next one by one application of commutativity or
associativity to some subexpression. Since equality is symmetric, proofs are correct
whether they are read forwards or backwards. The proof of a=b starts by proving
a=a', where a' has the same variables as a, combined in the standard form
vj, +0,,+ +++, _ +4) ++), with i;<ij4;. This proof of a=a' is the value of
stand[a]. A similar proof of b=b' is produced by stand[b]. Finally, stand{a] is
concatenated with the reversal of stand[b]. If the standard forms of a and b are
not syntactically identical, then there is a false step in the middle of the con-
catenated proofs, and that step is marked with the special operator falsestep[].
The interesting part of the procedure above is the stand operation, proving
equality of a term with its standard form. That proof works recursively on an
expression a+b, first standardizing a and b individually, then applying the
36 9. Miscellaneous Examples
following transformations to combine the standardized a and 6 into a single stan-
dard form.
1. (v+a)+(vj;+5) associates to vt (a+(v,+b)) when i<j
2. (v,ta)+(vj+b) commutes to (vj+b)+(v,+a) associates to v,+(b+(v;+a))
when i>j
3. (vta)+v; associates to v,+(atv,) when i <j
4, (v,ta)+v; commutes to vj;+(v,+a) when i>j
5. vj+(vj,+b) associates to (v,+v,)+b
commutes to (v;+v;)+b associates to vj+(v;+b) when i>j
6. v,+v; commutes to v;+v, when i>j
In cases 1-3 above, more transformations must be applied to subexpressions. In the
following program, the merge operator performs the transformations described
above.
Symbols
: Constructors for lists
cons: 2;
nil: 0;
include integer_numerals;
: Constructors for additive expressions
plus: 2;
we I;
: Errors in proofs
falsestep: 0;
/ Operators for testing and proving equality under commutativity, associativity
compare: 2;
stand: 1;
merge. 1;
plusp: 2;
appendp: 2;
: Standard list and arithmetic operators
9.4. A Toy Theorem Prover
addend. 2;
equ: 2;
include truth_values.
For all a, b, c, d, i, j, rem, reml, rem2:
comparela; b] = appendplstandla]J; standlbJ];
standivii]] = (Vii);
stand[plusla; b]] = mergelplusplstand[a]; stand[blll;
mergel(a b . rem)] = (a. mergel(b . rem)));
mergel(a)] = mergelal;
: Case 6
mergelpluslviiJ; vii] =
iflless[j; il;
commute vi and vj
(pluslvii]; vGll_ plusivijl; vil;
else
no change
(pluslviil; vi{IDI;
| : Case 5
| eee aes pluslv{jJ; bil] =
|
iflless[j; il,
Colushilll pluslv{jj; bi]
| ¥ associate vi with vj
plus[pluslviiJ; v[jl]; b]
commute vi and vj
pluslpluslvGj]; vill; 6] .
associate vi with b
plusp[{WGjD; mergelpluslviil: bill);
else
10 change
plusp[WhD); visvaeltichelt blll
37
38 9. Miscellaneous Examples
mergelplus{pluslviil; al; viii] =
iflless[j, il;
se 4
(plusiplus(vliJ; al: vii] . ;
commute vita and vj
pluspl@[j); mergelpluslviil; all);
ee
Ci
°* (pluslplustotil: al: vil.
associate a with vj
plusp{(Vli); mergelplusta; vG]I)I;
mergelplus[pluslvlil; al; plusivijJ; b]i] =
iflless[j; il;
2
si (plus[plus{viil; al; plusivij]; bl]
commute vita and vj+b
plus[pluslvijJ; b]; pluslviil; all .
associate b with vita
plusp[{VGjD; mergelplus[b; plusivli]; alll);
else
I
7 olusliaiivabplacbinbi
associate a with vj+b
plusp{(viil); mergelplusla; pluslvijJ; b1IIDI:
: plusplp; ql transforms proofs p of E1=F1 and q of E2=F2 into a proof
: of plus[E1; E2] = plus[F1; F2].
pluspl(a); rem] = pluspla; rem]:
pluspl(a b . rem1); (c . rem2)] = (plusla; c] . pluspl(b . rem1); (c . rem2)));
pluspla; 0] = 0
where a is either vli] or pluslb; c] end or end where;
pluspla; (b . rem)] = (plusta; b] . pluspla; rem)
where a is either lil or pluslb; c] end or end where;
: appendplp; q] appends proofs p and the reversal of q, coalescing the last lines
/ Uf they are the same, and indicating an error otherwise.
9.4. A Toy Theorem Prover 39
appendplI(a); (b)] = iflequla; b]; (a); (a falsestepf] b)J:
appendpl(a b . rem); c] = (a. appendpl(b . rem); cl);
appendpl(a); (b c . rem)] = addendlappendpl(a); (c . rem)]; bl:
: addendlfl; al adds the element a to the end of the list I.
addend[Q); a] = (a);
addend[(a . rem); b] = (a. addendlrem; bl);
: equ is extended to additive terms, as a test for syntactic equality.
equlviil; vj] = equli; ji;
equlplusla; bl]; pluslc; dl] = andlequta; cl; equlb; dll;
equlvii]; pluslc; dj] = false;
equlplusta; b]; v[j]] = false;
include equint;
: if, and, less are standard operators.
ifltrue; a; b] = a; iflfalse; a; b] = b;
andla; b] = ifla; b; false];
include lessint.
9.5. An Unusual Adder
The following example gives a rather obtuse way to add two numbers. The intent
of the example is to demonstrate a programming technique supported by the equa-
tion interpreter, but not by LISP, involving the definition of infinite structures. We
hope that this silly example will clarify the technique, while more substantial
examples in Sections 9.7, 15.3 and 15.4 will show its value in solving problems
40 9. Miscellaneous Examples
elegantly. The addition program below uses an infinite list of infinite lists, in which
the ith member of the jth list is the integer i+j. In order to add two nonnegative
integers, we select the answer out of this infinite addition table. The outermost
evaluation strategy guarantees that only a finite portion of the table will actually be
produced, so that the computation will terminate.
Symbols
: List constructors.
nil: 0;
include integer_numerals;
: List utilities.
element: 2;
first; 1;
tail: 1;
inclist: 1;
: Standard arithmetic operators.
add, subtract, equ: 2;
if- 3;
include truth_values;
: Unusual integer list, addition table and operator.
intlist: 0;
addtable: 0;
weirdadd: 2.
For all
firstl . DJ = x;
taill( . D] = 1;
elementli; I] = ifle ae ol;
firstll]
OS ieua: UJ; tail fill;
weirdaddli; j] = elementli; element[j; addtable[]II;
addtablef] = (intlistf] . inclistladdtablel]]);
9.5. An Unusual Adder 41
intlist] = (0. inclistlintlist QD;
inclistli] = addli; 1] where i is in integer_numerals end where;
inclist{(G.. DJ = (inclistli] . inclist{D;
include addint, subint, equint.
9.6. Arbitrary-Precision Integer Operations
The equation interpreter implementation provides the usual integer operations as
primitives, when these operations are applied to integers that may be represented in
single precision, and when the result of the operation is also single precision. In
order to provide arbitrary precision integer operations, we extend these primitive
sets of equations with some additional explicit equations.
The following equations define arbitrary-precision arithmetic on positive
integers in a straightforward way. A large base is chosen, for example base
2!5=32768, and the constructor extend is used to represent large numbers, with the
understanding that extend(x,i) represents x*base+i, for 0<i<base. longadd
and /ongmult are the binary operators for addition and multiplication. Addition
follows the grade school algorithm of adding digits from right to left, keeping track
of a carry. Multiplication also follows the usual algorithm for hand calculation,
adding up partial products produced by multiplying one digit of the second multi-
plicand with the entire first multiplicand.
Symbols
: Constructors for arbitrary-precision integers.
extend: 2;
include integer_numerals;
: Base for arithmetic.
42 9. Miscellaneous Examples
base: 0;
: Single precision arithmetic operators.
add: 2;
multiply: 2;
: Arbitrary-precision arithmetic operators.
longadd: 2;
longmult: 2;
: Operators used in defining the arithmetic operators.
sum: 3;
acarry: 3;
carryadd: 3;
prod: 3;
mearry: 3;
carrymult: 3.
For all x, y, 2, i,j, &:
base() = 32768;
longadd(x, y) = carryadd(x, y, 0);
carryadd(i, j, k) = if(equ(acarry(i, j, k), 0),
sum(i, j, k),
extend(acarry(i, j, k), sum(i, j, k)))
where i, j are in integer_numerals end where;
carryadd(extend(x, i), j, k) = carryadd(extend(x, i), extend(0, j), k)
where j is in integer_numerals end where;
carryadd(i, extend(y, j), k) = carryadd(extend(0, i), extend(y, j), k)
where i is in integer_numerals end where;
carryadd(extend(x, i), extend(y, j), k) =
extend (longadd(x, y, acarry(i, j, k)), sum(, j, kJ);
sum(i, j, k) = mod(add(i, add@, k)), baseQ);
acarry(i, j, k) = div(add(i, add(j, k)), baseQ);
longmult (x, j) = carrymult(x, j, 0)
where j is in integer_numerals end where;
longmult(x, extend(y, j)) =
9.6. Arbitrary-Precision Integers 43
longadd(carrymult(x, j, 0), extend(longmult(x, y), 0));
carrymult(i, j, k) = if(equ(mcarry(i, j, k), 0),
prod (i, é: iD
extend(mcarry(i, j, k), prod(i, j, k))
where i is in integer_numerals end where;
carrymult(extend(x, i), j, k) =
extend (carrymult(x, j, mcarry(i, j, k)), prod(i, j, k));
prodi(i, j, k) = mod(multiply(i, multiplyG, k)), baseQ);
mearry(i, j, k) = div(multiply(i, multiplyG, k)), baseQ);
include addint, divint, modint, multint.
The simple equational program above has several objectionable qualities.
First, in order to make the operations sum, acarry, prod, and mcarry really work,
we must choose a base much smaller than the largest integer representable in sin-
gle precision. In particular, to allow evaluation of prod and mcarry in all cases,
the base used by the program may not exceed the square root of full single preci-
sion. This use of a small base doubles the sizes of multiple-precision integer
representations. By redefining the offending operators, we may allow full use of
single precision, but only at the cost of a substantial additional time overhead. For
example, sum would have to be defined as
sum(i, j, k) = addmod(i, addmod(j, k, baseQ), baseQ);
addmod (i, j, k) = if(less(i, subtracttk, j)),
add(i, j),
add (subtract (maxi, j), k), min(@i, j)));
and prod would be even more complex, because of the possibility of zeroes. Even if
we accept the reduced base for arithmetic, extra time is required to add or multiply
44 9. Miscellaneous Examples
two single-precision numbers with a single-precision result, since we must check for
the nonexistent carry. Finally, it is distasteful to introduce new operators longadd
and longmult when our intuitive idea is to extend add and multiply.
In order to avoid these objections, we need a slightly better support from the
predefined operations add and multiply. When faced with an expression
add (a,8), where a and 8 are single-precision numerals, but their sum requires dou-
ble precision, the current version of the equation interpreter merely declines to
reduce. In order to support multiple-precision arithmetic cleanly and efficiently,
the implementation must be modified so that add and multiply produce results
whenever their arguments are single-precision numerals, even if the results require
double precision. The double precision results will be represented by use of the
extend operator in the program above. The only technical problem to be solved in
providing this greater support is the syntactic one: what status does the extend
operator have - need it be declared by the user? This syntactic problem is a spe-
cial case of a very general need for modular constructs, including facilities to com-
bine sets of equations and to hide certain internally meaningful symbols from the
user. Rather than solve the special case, we have postponed this important
improvement until the general problem of modularity is solved (see Section 14).
Given an appropriate improvement to the predefined arithmetic operations,
arbitrary precision may be provided by the following equational program. lowdigit
picks off the lowest order digit of an extended-precision numeral, highdigits pro-
duces all but the lowest order digit.
Symbols
: Constructors for arbitrary-precision integers.
extend: 2;
include integer_numerals;
9.6. Arbitrary-Precision Integers 45
: Arithmetic operators.
add: 2;
multiply: 2;
: Operators used in defining the arithmetic operators.
lowdigit: 1;
highdigits: 1.
For all x, y, z, i, j, k:
add (extend (x, i), extend(y, j)) = add(extend(add(x, y), 0), add(i, j));
add(extend(x, i), j) = extend(add(x, highdigits(add(i, j))), lowdigit(add(i, j)))
where j is in integer_numerals end where;
add(i, extend(y, j)) = extend(add(y, highdigits(add(i, j))), lowdigit(add(i, j)))
where i is in integer_numerals end where;
lowdigit(extend(x, i) = i;
highdigits(extend(x, i) = x;
multiply (x, extend(y, j)) = add(multiply(x, j), extend(multiply(x, y), 0));
multiply (extend(x, i), j) = add(multiply(i, j), extend(multiply(x, j), 0))
where j is in integer numerals end where;
include addint, multint.
This improved equational program answers all of the objections to the first one, and
is substantially simpler. Notice that, whenever an operation is applied to single-
precision arguments, yielding a single-precision result, only the predefined equa-
tions for the operation are applied, so there is no additional time overhead for those
operations. Negative integers may be handled through a negate operator, or by
negating every digit in the representation of a negative number. The second solu-
tion wastes one bit in each digit, but is more compact for single-precision negative
numbers, and avoids additional time overhead for operations on single-precision
46 9. Miscellaneous Examples
negative numbers.
9.7. Exact Addition of Real Numbers
Another interesting example of programming with infinite lists involves exact com-
putations on the constructive real numbers [Bi67, Br79, My72]. In principle, a
constructive real number is a program enumerating a sequence of rational intervals
converging to a real number. Explicit use of these intervals is quite clumsy com-
pared to the more intuitive representation of reals as infinite decimals. Unfor-
tunately, addition is not computable over infinite decimals. Suppose we try to add
0.99:++ to 1.00---. No matter how many digits of the two summands we have
seen, we cannot decide for sure whether the sum should be 1.--+ or 2.---. If
he sequence of 9s in the first summand ever drops to a lower digit, then the sum
must be of the form 1.--- ; if the sequence of Os in the second summand ever rises
to a higher digit, then the sum must be of the form 2.-+-. As long as the 9s and
Qs continue, we cannot reliably produce the first digit of the sum. Ironically, in
exactly the case where we can never decide whether to use 1.--- or 2.---, either
one would be right, since 1.99--- = 2.00---. One aspect of the problem is that
conventional infinite decimal notation allows multiple representations of certain
numbers, such as 1.99--+ =2.00---, but requires a unique representation of
others, such as 0.11---. The solution is to generalize the notation so that every
number has multiple representations, by allowing individual digits to be negative as
well as positive. This idea was proposed for bit-serial operations on varying-
precision real numbers [Av61, At75, O179].
Let the infinite list of integers (dy d, d,---), be used to represent the real
number S'd;*107'. do is the integer part, and there is an implicit decimal point
i=0
9.7. Exact Addition of Real Numbers 47
between dg and d;. In conventional base 10 notation, each d; for i21 would be
limited to the range [0,9]. Suppose that the range [—9,+9] is used instead
([-5,+5] suffices, in fact, but leads to a clumsier program). As a result, every real
number has more than one representation. In particular, the intervals correspond-
ing to finite decimals overlap, so that every real fumiber is in the interior of arbi-
trarily small intervals. For a conventional finite decimal a, let Ip.(a) denote the
interval of real numbers having conventional representations beginning with a.
Similarly I_99(a) denotes the interval of real numbers having extended representa-
tions beginning with a, where a is a finite decimal with digits from —9 to 9.
The problem with the conventional notation is that certain real numbers do
not lie in the interiors of any small intervals Ij9(a), but only on the endpoints.
When generating the decimal expansion of a real number x, it is not safe to specify
an interval with x at an endpoint, since an arbitrarily small correction to x may
take us out of that interval. For example, 1.1 is only an endpoint of the intervals
Ing (1-1) = (1.1, 1.2], Ipg(1-10) = (1.1, 1.11], Ip9(1-100) = [1, 1.101], etc., and the
smallest interval Io.9(a) with 1 in its interior is Ip.9(1) = [1,2]. On the other hand,
1.1 is in the interior of each interval I_99(1.1) = [1, 1.2], Lo 9(1.10) = [1.09, 1.11],
T.99(1.100) = [1.009, 1.101], etc., because the larger number of digits stretches
these intervals to twice the width of the conventional ones, yielding enough overlaps
of intervals to avoid the singularities of the conventional notation.
The notation described above is a fixed point notation. Infinite floating point
decimals may also be defined, allowing dp to be restricted to the range [—9,+9] as
well. Such an extension of the notation makes the programs for arithmetic opera-
tions more complex, but does not introduce any essential new ideas. Conventional
computer arithmetic on real numbers truncates the infinite representation of a real
48 9. Miscellaneous Examples
to some finite precision. The equation interpreter is capable of handling infinite
lists, so, except for the final output, it may manipulate exact real numbers.
In order to program addition of infinite-precision real numbers, as described
above, we mimic a program in which the two input numbers are presented to a pro-
cess called addlist, which merely adds corresponding elements in the input lists to
_ produce the output list. Notice that the output from addlist represents the real
sum of the two inputs, but has digits in the range [-18,+18]. The output from
addlist goes to a process called compress, which restores the list elements to the
range —9 to +9. The output from compress is the desired result. The function
add is defined by the composition of addlist and compress. Notice that, while
addlist produces one output digit for every input pair, compress must see more
than one input digit in order to produce a single output digit. Looking at d; and
d;4;, where d; has already been compressed to the range [—8,+8], compress
adjusts d;,, into the range [-8,+8] by adding or subtracting 10, if necessary, and
compensating by adjusting d,; by +1. Notice that it is important to first place a
digit in [—8,+8], so that it may be used in the adjustment of the next digit and
stay in [—9,+9].
In order to use the addition procedure, we need to provide some interesting
definitions of infinite-precision real numbers, and also a function called standard to
produce the output in conventional base-10 with finite precision. standard takes
two arguments: a single integer i, and an infinite-precision real r. The result is the
standard base-10 representation of r to i significant digits. Notice that standard
may require i+1 digits of input in order to produce the first digit of output.
9.7. Exact Addition of Real Numbers
Symbols
: List constructors
cons: 2;
nil; 0;
: List manipulation operators
first: 1;
tail: 1;
: Real arithmetic operator
add: 2;
: Other operators needed in the definitions
addlist: 2;
compress: 1;
stneg, stpos: 3;
revneg, revpos: 2;
rotate: I;
: Input and output operators
repeat: 2;
standard: 2;
: Standard arithmetic and logical operators and symbols.
ift 3;
equ, less, greater: 2;
include integer_numerals, truth_values.
For all x, y, i, j, k, l, a, 6:
: add is extended to infinite-precision real numbers.
addl(i . x); G . y)] = compressladdlistlG . x); G . y)II;
: addlist adds corresponding elements of infinite lists.
addlistl(i . x); Gj . yj] = (addli; j] . addlistlx; y));
: compress normalizes an infinite list of digits in the range [-18, +18]
: into the range [-9, +9].
compressl(i j . x)] =
iflless[j; -8]; (subtractli; 1] . compress[(addlj; 10] . x)));
iflgreater[j; 8]; (addli; 1] . compressl(subtractlj; 10] . x);
(i. compresslG . x) DI;
49
50 9. Miscellaneous Examples
: repeatlx; y] is the infinite extension of the decimal expansion x
: by repeating the finite sequence y.
repeatl(i . x); y] = (i. repeatlx; yl);
repeatl(); y] = (firstly] . repeat[0); rotately]));
: rotately] rotates the first element of the list y to the end.
. rotatel(i)] = (i;
rotatel(i j . x] = (j. rotatel(i . x)));
2 standardli; a] is the normal base 10 expansion of the first i digits of a.
standard[j; (i. a)] = iflequlj; 0]; 0;
iflequli: OJ]; (0. standard{subtractlj; 1]: al);
ifllessli; 0]; stneglj; G.. a); Ol;
stposfj; (i. a; OI:
: stnegli; a; b] translates the first i digits of a into normal base 10 notation,
backwards, and appends b, assuming that a is part of a negative number.
: stposli; a; b] does the same thing for positive numbers.
stneglj; (i. a); b] = iflequlj; Ol:
revneglb; QJ;
Hnesubiractih LU; a; @. bd:
stposlj; (i. a); b] = iflequlj; 0]:
revposlb; QJ;
stposlsubtractlj; 1]; a; (i. bd];
: revnegla; b] reverses the finite decimal expansion a, borrowing and carrying so as
to make each digit nonpositive, finally appending the list b.
: revposla; b] does the same, making each digit nonnegative.
revnegl(; b] = b;
revnegl(i . a); b] = ifflessli; 1]:
revnegla; (i. b)J:
revnegl(add[firstla]; 1] . taillal);
(addli; -10] . b)II;
revposlQ); b] = b;
9.7. Exact Addition of Real Numbers 51
revposl(i . a); b] = ifllessli; 0];
revposl(add[firstlal; -1] . taillal);
(addli; 10] . b)I;
revposla; (i . bJII;
: first, tail, if, add, subtract, equ, less, are standard list,
: conditional, and arithmetic operators.
first. a] = i; taill(i . a] = a;
ifltrue; x;y] =x; — iflfalse; x; y] = y;
greaterli; j] = less[subtract[0; i]; subtractl0; j]]
where i, j are in integer_numerals end where;
include addint, subint, equint, lessint.
In this example, producing output in standard form was much more difficult than
performing the addition. Other arithmetic operations, however, such as multiplica-
tion, are much more difficult to program.
9.8. Polynomial Addition
Polynomial addition is a commonplace program in LISP, with polynomials
represented by lists of coefficients. The equation interpreter allows polynomial
sums to be computed in the same notation that we normally use to write polynomi-
als, with no distinction between the operator add that applies to integer numerals,
the operator add that applies to polynomials, and the operator add used to con-
struct polynomials. In LISP, the first would be PLUS, the second a user defined
function, perhaps called POLYPLUS and the third would be encoded by a particu-
lar use of cons.
In effect, the equational programs shown below for "adding polynomials" are
really just simplifying polynomial expressions into a natural canonical form. The
Horner—rule form for a polynomial of degree n in the variable X is
52 9. Miscellaneous Examples
CotXe(cy+Xe +++ +X4e,°°*). The list (cgc,***c,), typically used to
represent the same polynomial in LISP, is a very simple encoding of the Horner-
rule form. In the equation interpreter, we may use the Horner-rule form literally.
The resulting program simplifies terms of the form 7,+7 , where each of a, and 7
is in Horner-rule form, to an equivalent Horner-rule form. Notice that the symbol
add in the following programs may be an active symbol or a static constructor for
polynomials, depending on context. Also, notice that the variable X over which the
polynomials are expressed is not a variable with respect to the equational program,
but an atomic_symbol.
Symbols
add: 2;
multiply: 2;
include integer_numerals, atomic_symbols.
For all i, j, a, b:
add(add(i, multiply(X, a)), add(j, pee b)) =
add(add(i, j), multiply(X, add(a, b)))
where i, j are in integer_numerals end where;
add(i, add(j, multiply(X, b))) =
add(add(i, J), multiply (X, b))
where i, j are in integer_numerals end where;
add(add(i, multiply (X, a
add(add(i, j), multiply (X, a)
where i, j are in integer_numerals end where;
include addint.
The program above is satisfyingly intuitive, but does not remove high-order 0
coefficients. Thus, (1+X* (2+X*3))+(1+X«(2+X*—3)) reduces to 2+X*(4+X*0)
instead of the more helpful 2+X+4. Getting rid of the high-order zeroes is tricky,
since the natural equations X+0 = 0 and a+0 =a suffer from overlaps with the
9.8. Polynomial Addition 53
other equations. One solution, show below, is to check for zeroes before construct-
ing a Horner-rule form, rather than eliminating them afterwards.
Symbols
add: 2;
multiply. 2;
ift 3;
equ: 2;
and: 2;
include integer_numerals, atomic_symbols, truth_values.
For all i, j, a, b,c, a:
add(add(i, a), add(j, b)) = add(add(i, j), add(a, b))
where i, j are in integer_numerals,
a, b are multiply(c, d)
end where;
add(i, add(j, b)) = add(addi, j), b)
where i, j are in integer_numerals,
b is multiply(c, 2)
end wiere;
add(add(i, a), j) = add(addii, j), a)
where i, j are in integer_numerals,
a is multiply(c, d)
end where;
add(multiply(X, a), multiply(X, b)) =
if(equ(add(a,b), 0), 0, multiply (X, add(a, b)));
equ(add(i, multiply(X, a)), addG, multiply(X, b))) =
and(equ(i, j), equ(a, b))
where i, j are in integer_numerals end where;
equ(i, add(j, multiply(X, b))) = false
where i, j are in integer_numerals end where;
equ(add(i, multiply (X, a)), j) = false
where i, j are in integer_numerals end where,
if(true, a, b) = a; if(false, a, b) = b;
and(a, b) = if(a, 6, false);
include addint, equint.
54 9. Miscellaneous Examples
It is amusing to consider other natural forms of polynomials, such as the
power—series form, cgtX°+c +X 1+.-+++4¢,. This corresponds to the representa-
tion of polynomials as lists of exponent-coefficient pairs. For dense polynomials,
the exponents waste space, but for sparse polynomials the omission of internal
zeroes may make up for the inclusion of exponents: as in 1+X!, A nice equa-
tional programming challenge is to produce an elegant program for addition of
polynomials in power-series form.
9,9. The Combinator Calculus
Weak reduction in the combinator calculus [CF58, St72] is a natural sort of com-
putation to describe with equations. The following equations use the Lambda syn-
tax of Section 4.3 to allow the abbreviation
(a, ay *** a,)
for the expression
AP(AP( +++ AP(a, a4), °** ), ay)
The symbol Lambda from Section 4.3 is not used in the combinator calculus.
Symbols
AP: 2;
S, K, I: 0;
include atomic_symbols.
For all x, y, z:
(Sxy2 = (xz yz);
(Kx y) = x;
9.9, The Combinator Calculus 55
(ix) =x.
This example, and the polynomial addition of Section 9.8, differ from the first
ones in that the only symbol that can construct complex expressions, AP, appears
(implicitly) at the head of left-hand sides of equations. In many interesting sys-
tems of terms, there are one or more symbols that do not appear at the heads of
left-hand sides, so that they may be used to construct structures that are stable
with respect to reduction. These stable structures may be analyzed and rearranged
by other operators. For example, in LISP, the symbol cons is a constructor, and
an expression made up only of cons and atomic symbols (i.e., an S-expression), is
always in normal form. It is helpful to notice the existence of constructors when
they occur, but the example above illustrates the usefulness of allowing systems
without constructors. The use of constructors is discussed further in Section 12.1.
9.10. Beta Reduction in the Lambda Calculus
This example should only be read by a user with previous knowledge of the lambda
calculus. The reader needs to read de Bruijn’s article [deB72] in order to under-
stand the treatment of variables. The object is to reduce an arbitrary lambda term
to normal form by a sequence of 6-reductions. A number of rather tricky prob-
lems are encountered, but some of the usual problems encountered by other imple-
mentations of 6-reduction are avoided by use of the equation interpreter. This
example uses the Lambda notation of Section 4.3.
The lambda calculus of Church [Ch41] presents several sticky problems for
the design of an evaluator. First, the problem of capture of bound variables
appears to require the use of the a-rule
56 9. Miscellaneous Examples
\xE — \y.Ely/x]
to change bound variables. There is no simple way to generate the new variables
needed for application of the a-rule in an equational program. In more conven-
tional languages, variable generation is simple, but its presence clutters up the pro-
gram, and causes the outputs to be hard to read.
De Bruijn [deB72] gives a notation for lambda terms in which an occurrence
of a bound variable is represented by the number of lambda bindings appearing
between it and its binding occurrence. This notation allows a simple and elegant
solution of the technical problem of capture, but provides an even less readable out-
put. We represent each bound variable by a term var[x,i], where x is the name of
the variable and i is the de Bruijn number. The first set of equations below
translates a lambda term in traditional notation into this modified de Bruijn nota-
tion. De Bruijn notation normally omits the name of the bound variable immedi-
ately after an occurrence of lambda, but we retain the name of the variable for
readability. We write the de Bruijn form of a lambda binding as \:x.E (the x
appears directly as an argument to the lambda) to distinguish from the traditional
notation \x.E (in which the first argument to lambda is the singleton list contain-
ing x).
Symbols
: Operators constructing lambda expressions
Lambda: 2;
AP: 2;
: Constructors for lists
cons: 2;
nil: 0;
: varlx, i] represents the variable with de Bruijn number i, named x.
var: 2;
9.10. Beta Reduction 57
: bindvar carries binding instances of variables to the corresponding bound
: instances, computing de Bruijn numbers on the way.
bindvar: 3;
: Arithmetical and logical operators and symbols
ift 3;
equ: 2;
add: 2;
include atomic_symbols, truth_values, integer_numerals.
For all x, y, E, F, i, j:
: Multiple-argument lambda bindings are broken into sequences of lambdas.
\x yerem.E = \x.\y:rem.E;
: Single-argument lambda bindings are encoded in de Bruijn notation.
\x.E = \:x.bindvarlE, x, OJ;
: bindvar[E, x, i] attaches de Bruijn numbers to all free instances of the variable
x in the lambda-term E, assuming that E is embedded in exactly i
lambda bindings within the binding instance of x.
bindvarlx, y, i] = iflequlx, yJ, varlx, i], x]
where x is in atomic_symbols end where;
bindvarlvarlx, jl, y, i] = varlx, jl:
bindvar{(E F), y, i] = (bindvarlE, y, i] bindvarlF, y, iD;
bindvark;x.E, y, i] = \:x.bindvarlE, y, addli, 1]]
where x is in atomic_symbols end where;
‘if is the standard conditional function, equ the standard equality test, and add
: the standard addition operator on integers.
ifltrue, E, F] = E; iflfalse, E, F] = F:
include equatom, addint, equint.
In order to perform evaluation of a lambda term in de Bruijn notation, the
transformation described above must be done logically prior to the actual -
58 9. Miscellaneous Examples
reduction steps. In principle, equations for de Bruijn notation and 6-reduction
could be combined into one specification, but it seems to be rather difficult to avoid
overlapping left-hand sides in such a combined specification (see Section 5, restric-
tion 4). At any rate, it makes logical sense to think of the transformation to
de Bruijn notation as a syntactic preprocessing step, rather than part of the seman-
tics of B-reduction. Therefore, we built a custom syntactic preprocessor for the B-
reduction equations. After executing
loadsyntax Lambda
int.in (the syntactic preprocessor for the command ei) is the shell script:
#! binlsh
SYSTEM/Syntax/Outersynt/Lambdalint.in SYSTEM |
SYSTEM{Syntax/Commoniint.in.trans SYSTEM |
SYSTEM{Syntax/Commoniint.infin SYSTEM ;
where SYSTEM is the directory containing equational interpreter system libraries,
differing in different installations. we edited int.in to look like
#! finlsh
SYSTEM(Syntax/Outersynt/Lambdalint.in SYSTEM |
SYSTEM|Syntax/Commoniint.in.trans SYSTEM |
SYSTEM{Syntax/Commoniint.in.fin SYSTEM |
DEBRUIJN&nterpreter;
where DEBRUIJN is the directory in which we constructed the transformation to
de Bruijn notation. We did not change pre.in (the syntactic preprocessor for ep)
9.10. Beta Reduction 59
or int.out (the pretty-printer for ei).
Even with the elegant de Bruijn notation, two technical problems remain.
First, the official definition of B-reduction:
(\x.E F) — E[F/x]
cannot be written as a single equation, since the equation interpreter has no nota-
tion for syntactic substitution (see [K180a] for a theoretical discussion of term
rewriting systems with substitution). The obvious solution to this problem is to
introduce a symbol for substitution, and define its operation recursively with equa-
tions. A nice version of this solution is given by Staples [St79], along with a proof
that leftmost-outermost evaluation is optimal for his rules. For notational econ-
omy, we take advantage of the fact that the lambda term \x.E may be used to
represent syntactic substitution, so that no explicit symbol for substitution is
required. Combining this observation with Staples’ rules, we produced the follow-
ing recursive version of 6-reduction:
(\x.x G) -G
(\x.y G) — y where x and y are different variables
(\x.(E F) G) — (Ax.E G) (\x.F G))
(\x.\y.£ G) — Qy.\x.£ G))
These rules may be translated straightforwardly into the de Bruijn notation, and
written as equations, using a conditional function and equality test to combine the
first two rules into one equation. In the de Bruijn form, occurrences of lambda
60 9. Miscellaneous Examples
must be annotated with integers indicating how many other instances of lambda
have been passed in applications of the fourth rule above. Otherwise, there would
be no way to recognize the identity of the two instances of the same variable in the
first rule. Initially and finally, all of these integer labels on lambdas will be 0, only
in intermediate steps of substitution will they acquire higher values.
- Unfortunately, the left-hand side of the third rule overlaps itself, violating res-
triction 4 of Section 5. To avoid this overlap, we introduce a second application
operator, JAP, to distinguish applications that are not the heads of rules. The
third rule above is restricted to the case where E is applied to F by IAP. Since
\x.(E F) is applied to G by the usual application operator, AP, there is no over-
lap. Interestingly, Staples introduced essentially the same restriction in a different
notation because, without the restriction, leftmost-outermost reduction is not
optimal. This technique for avoiding overlap is discussed in Section 12.4. [0S84]
develops these ideas for evaluating lambda-terms more thoroughly, but not in the
notation of the equation interpreter. All of the observations above lead to the fol-
lowing equations.
Symbols
: Constructors for lists
cons: 2;
nil: 0;
: Constructors for lambda-terms
: IAP represents an application that is known to be inert (cannot become the head
: of a redex as the result of reductions in the subtree below).
Lambda: 2;
incvars: 2;
: Arithmetical and logical operators
if: 3;
9.10. Beta Reduction 61
add: 2;
equ: 2;
less: 2;
include atomic_symbols, truth_values, integer_numerals.
For all x, y, z, E, F, G, i, j:
: Detect inert applications and mark them with IAP.
(x E) = IAPIx, E] where x is either
varly, i
or IAPIF, G]
or in atomic_symbols
end or
end where;
:\x:i.E represents a lambda expression that has passed by i other instances of
lambda. It is necessary to count such passings in order to recognize instances
of the bound variable corresponding to the x above. Only active instances of
lambda, that is, ones that are actually applied to something, are given an
integer tag of this sort.
(:x.E F) = (\x:0.E F)
where x is in atomic_symbols end where;
(y.i.varlx, j] E) = iflequli, j], E,
ifllessli, jl, varlx, addfj, -1I],
varlx, j]]]
where i is in integer_numerals end where;
(\y:i.x E) = x where x is in atomic_symbols,
i is in integer_numerals
end where;
(\y:i IAPIE, F] G) = ((y-i.E G) \y:i.F G))
where i is in integer_numerals end where;
Ay-i\:z.E F) = (:z.\y:addli, 11.E F))
where i is in integer_numerals end where;
incvarslvarlx, il, j] = ifllessli, j], varlx, iJ, varlx, addli, LI];
incvars[x, i] = x where x is either in atomic_symbols
or in integer_numerals
or in truth_values
or in character_strings
end or
end where;
62 9. Miscellaneous Examples
inevarslIAPIE, Fl], i] = IAPlincvars[E, iJ, incvarsIF, iJ]:
incvars|\x:t.E, i] = \x:incvarslt, i]. incvarslE, addli, 1]];
ifltrue, x, y] = x; iflfalse, x, y] = y;
include equint, addint, lessint.
Certain other approaches to the lambda calculus, such as the evaluation stra-
tegy in LISP, avoid some of the notational problems associated with overlapping
left-hand sides by using an evaluation operator. Essentially, such techniques
reduce eval/[E] to the normal form of E, rather than reducing E itself. Such a
solution could be programmed with equations, but it introduces two more problems,
both of which exist in standard implementations of LISP. Notice that the outer-
most evaluation strategy used by the equation interpreter exempted us from worry-
ing about cases, such as (\x.y (\x. (xx)\x. (xx))), which have a normal form, but
also an infinite sequence of reductions. Implementations of the lambda calculus
using an evaluation operator must explicitly program leftmost-outermost evaluation,
else they will compute infinitely on such examples without producing the normal
form. Also, in terms, such as \x. (\y.yz), whose normal forms contain lambdas (in
this case, the normal form is \x.z), it is very easy to neglect to evaluate the body of
the unreduced lambda binding. Using rules that reduce lambda terms directly puts
the onus on the equation interpreter to make sure that these rules are applied
wherever possible.
9.11. Lucid
Lucid is a programming language designed by Ashcroft and Wadge [AW76,
AW77] to mimic procedural computation with nonprocedural semantics. Early
9.11. Lucid 63
attempts to construct interpreters [Ca76] and compilers [Ho78] encountered serious
difficulties. The following set of equations, adapted from [HO82b], produce a Lucid
interpreter directly, using the Standmath syntax of Section 4.1. A trivial Lucid
program, itself consisting of equations, is appended to the end of the equations that
define Lucid. Evaluation of the expression output () produces the result of running
the Lucid program. Even though convenience and performance considerations
require the eventual production of a hand-crafted Lucid interpreter, such as the
one in [Ca76], the ability to define and experiment with the Lucid language in the
simple and relatively transparent form below would certainly have been helpful in
the early stages of Lucid development.
: Equations for the programming language Lucid, plus a Lucid
: program generating a list of integers.
Symbols
: Lucid symbols
NOT; I;
OR: 2;
add: 2;
equ: 2;
if: 3;
first: 1;
next: 1;
asa: 2;
latest: 1;
latestinv: 1;
Soy: 2;
include integer_numerals, truth_values;
: symbols in the Lucid program
intlist: 0;
output: 0.
For all W, X, Y, Z:
: Definitions of the Lucid operators
64 9. Miscellaneous Examples
NOT(true) = false;
NOT (false) = true;
NOT (fby(W, X)) = foy(NOT (first(W)), NOT(X));
NOT (latest (X)) = latest(NOT(X));
OR(true, X) = true;
OR(false, X) = X;
ORG CW, X), [oy VY, Z)) =
OR (first(W), first(Y)), OR(X, Z));
Bs »(W, X), latest(¥)) =
Soy (ORGirst(W), latest(Y)), OR(X, latest(Y)));
OR(latest(x), foy(Y, Z))
foy(OR (latest (X), first(¥)), OR(latest(X), Z));
OR(foy(W, X), false) = foy(W, X);
OR(latest(X), latest(Y)) = latest(OR(X, Y));
OR(latest(X), false) = latest (X);
if(true, Y, Z) = Y;
if(false, Y, Z) = Z;
if(by(W, X), Y, Z) = foy(if(frst(W), Y, Z), if(X, Y, Z);
if(latest(X), Y, Z) = latest(if(X, Y, Z));
first(X) =
where Xn is either in truth_values
or in integer_numerals
end or
end where;
first (fby(X, Y)) = first(X);
first (latest(X)) = latest(X);
next(X) =X
9.11. Lucid 65
where X is either in truth_values
or in integer_numerals
end or
end where,
next (fby(X, Y)) = Y;
next (latest(X)) = latest(X);
asa(X, Y) = if(first(Y), first(X), asa(next(X), next(Y)));
latestinv(X) =
where X is Ais in truth values
or in integer_numerals
end or
end where,
latestinv(fby(X, Y)) = latestinv(X);
latestinv(latest(X)) = X;
add(fby(W, X), (Y, Z))
Ne UA 3, ‘frst (Y)), add(X, Z));
ae (W, X), latest(Y) =
‘fby (a dd (first (W), latest(Y)), add(X, latest(Y)));
add (latest (X), fby(Y,
foy (add (latest(X), haa. add(latest(X), Z));
ae W, X), Y) =
da (first( W), Y), add(X, Y))
rahe Y is in integer_numerals end where,
add(Xx, fby(Y, Z))
Soy (add (x, ‘first (Y)), add(X, Z))
where X is in integer_numerals end where;
add (latest (X), latest(Y)) = latest(add(X, Y));
add (latest(X), Y) = latest(add(x, Y))
where Y is in integer_numerals end where;
——
66
add(X, latest(Y)) = latest(add(X, Y))
where X is in integer_numerals end where;
(fby(W, X), foy(Y, Z)) =
Tautiea dD, first(Y)), equ(X, Z));
equ(foy(W, X), latest(Y)) =
foy(equ(first(W), latest(¥)), equ(X, latest(Y)));
equ(latest(X), foy(Y, Z)) =
foy(equ(latest(X), first(Y)), equ(latest(X), Z));
equ (fby(W, X), Y) =
foylequ(first(W), Y), equ(X, Y))
where Y is in integer_numerals end where;
equ(X, foy(Y, Z)) =
foy(equ(X, first(Y)), equ(X, Z))
where X is in integer_numerals end where;
equ(latest(X), latest(Y)) = latest (equ(X, Y));
equ(latest(X), Y) = latest(equ(X, Y))
where Y is in integer_numerals end where;
equ(X, latest(Y)) = latest (equ(X, Y))
where X is in integer_numerals end where;
include addint, equint;
: A trivial Lucid program
intlistO = foy(0, add(intlistO, 1);
outputO = fby(first(intlistO),
(first (next (intlistO)),
a re cna attist OD).
9. Miscellaneous Examples
The equational program given above differs from other Lucid interpreters in
one significant way, and two superficial ways. First and most significant, the OR
9.11. Lucid 67
operator defined above is not as powerful as the OR operator in Lucid, because it
fails to satisfy the equation OR(X,true)=true, when X cannot be evaluated to a
truth value. The weakening of the OR operator is required by restrictions 3 and 5
of Section 5. These restrictions will be relaxed in later versions of the equation
interpreter, allowing a full implementation of the Lucid OR. Second, the variable
INPUT, which in Lucid is implicitly defined to be the sequence of values in the
input file of the Lucid program when it runs, is not definable until run time, so it
cannot be given in the equations above. In order to mimic the input behavior of
Lucid, the equation interpreter would have to be used with a syntactic preprocessor
to embed given inputs within the term to be evaluated. Interactive input would
require an interactive interface to the equation interpreter. Such an interactive
interface does not exist in the current version, but is a likely addition in later ver-
sions (see Section 15.3). Finally, of the many primitive arithmetic and logical
operations of Lucid, only add, equ, OR, and if have been given above. To include
other such operations requires duplicating the equations distributing primitive
operations over fby and latest. With a large set of primitives, these equations
would become unacceptably unwieldy. A truly satisfying equational implementa-
tion of Lucid would have to encode primitive operations as nullary symbols, and
use an application operator similar to the one in the Curry inner syntax of Section
4.4 in order to give only one set of distributive equations.
10. Errors, Failures, and Diagnostic Aids
In the interest of truth in software advertising, exceptional cases in the equation
interpreter are divided into two classes: errors and failures. Errors are definite
mistakes on the part of the user resulting from violations of reasonable and concep-
tually necessary constraints on processing. Failures are the fault of the inter-
preter itself, and include exhaustion of resources and exceeding of arbitrary limits.
Each message on an exceptional case is produced on the UNIX standard error file,
begins with the appropriate word "Error" or "Failure", and ends with an identify-
ing message number, intended to help in maintenance. An attempt is made to
explain the error or failure so that the user may correct or avoid it. The eventual
goal of the project is that the only type of failure occurring in the reduction of a
term to normal form will be exhaustion of the total space resources. Currently, the
interpreter will fail when presented with individual input symbols that are too long,
but it will not fail due to overflow of a value during reduction. There are also some
possible failures in the syntactic preprocessing and output pretty-printing steps that
result in messages from yacc (the UNIX parser generator) rather than from the
equational interpreter system. These failures apparently are all the result of
overflow of some allocated space, particularly the yacc parsing stack. Occasionally,
running of a large problem, or of too many problems simultaneously, will cause an
overflow of some UNIX limits, such as the limit on the number of processes that
may run concurrently.
Because of the layered modular design of the interpreter, different sorts of
errors may be reported at different levels of processing, and, regrettably, in slightly
different forms. For the preprocessor (ep), the important levels are 1) context-free
syntactic analysis, 2) context-sensitive syntactic analysis, and 3) semantic process-
10. Errors, Failures, Diagnostics 69
ing. For the interpreter (ei), only levels 1 and 3 are relevant. Sections 10.1
through 10.3 describe the sorts of messages produced at each of these levels.
10.1. Context-Free Syntactic Errors and Failures
Context-free syntactic errors in preprocessor input may involve the general syntax
of definitions, described in Section 3, or one of the specific syntaxes for terms
described in Section 4. Context-free errors in interpreter input may only involve a
specific term syntax. Error messages relating to a specific term syntax always
include the name of the syntax being used. Error detection is based on the parsing
strategy used by yacc [Jo78]. Each error message includes a statement of the syn-
tactic restriction most likely to cause that sort of parsing failure. The parser makes
no attempt to recover from an error, so only the first syntactic error is likely to be
reported. It is possible that an error in a term will be detected as an error in the
general syntax of definitions, and vice versa. Error messages are particularly
opaque when the wrong syntactic preprocessor was loaded by the last invocation of
loadsyntax, so the user should always pay attention to the name of the syntax in
use. Yacc failures are possible in the syntactic preprocessing, either from parser
stack overflow, or from an individual symbol being too long.
10.2. Context-Sensitive Syntactic Errors and Failures
Context-sensitive errors are only relevant to preprocessor input. They all involve
inconsistent use of symbols. The three types of misuse are: 1) repeated declara-
tion of the same symbol; 2) use of a declared symbol with the wrong arity; 3)
attempt to include a class of symbols or equations that does not exist; 4) repetition
of a variable symbol on the left-hand side of an equation; 5) appearance of a vari-
able on the right-hand side of an equation that does not appear on the left.
70 10. Errors, Failures, Diagnostics
Context-sensitive syntactic preprocessing may fail due to exhaustion of space
resources, or to an individual symbol being too long for the current version. The
second sort of failure will be avoided in later versions. In order to produce a lexi-
con presenting all of the symbols used in an equational program, see Section 10.4
below.
10.3. Semantic Errors and Failures
The only semantic failure in the interpreter is exhaustion of total space resource.
Other semantic errors and failures are only relevant to preprocessor input. The
simplest such error is use of a symbol from one of the classes integer_numerals,
atomic_symbols, truth_values, or characters, without a declaration of that
class. In future versions, these errors will be classified as context-sensitive syn-
tactic errors. The more interesting errors are violations of restrictions 3, 4, and 5
from Section 5. Violations of these restrictions always involve nontrivial overlay-
ings of parts of left-hand sides of equations. In addition to describing which res-
triction was violated, and naming the violating equations, the preprocessor tries to
report the location of the overlap by naming the critical symbol involved. This is
probably the weakest part of the error reporting, and future versions will try to
provide more graphic reports for semantic errors. Notice that restriction 5 (left-
sequentiality) will be removed in later versions. To specify the offending equations,
the preprocessor numbers all equations, including predefined classes (counting 1 for
each class), and reports equations by number. In order to be sure of the number-
ing used by the preprocessor, and in order to get a more graphic view of the terms
in the tree representation used by the preprocessor, the user should see Section 10.5
below.
10.4. Producing a Lexicon val
10.4. Producing a Lexicon to Detect Inappropriate Uses of Symbols (e/)
After executing
ep Equnsdir
the user may produce a lexicon listing in separate categories
1) all declared literal symbols
2) all declared literal symbols not appearing in equations
3) all atomic symbols appearing in equations
4) all characters appearing in equations
5) all truth values appearing in equations.
Empty categories are omitted, and symbols within a category are given in alphabet-
ical order. A lexicon is produced on the standard output by typing
el Equnsdir
el stands for equational lexicon. The lexicon is intended to be used to discover
accidental misspellings and omissions that may cause a symbol to belong to a
category other than the one intended. Each lexicon is headed by the date and time
of the last invocation of ep. Changes to definitions after the given date and time
will not be reflected in the lexicon.
10.5. Producing a Graphic Display of Equations In Tree Form (es)
In order to understand the semantic errors described in Section 10.3, it is useful to
see a set of equations in the same form that the preprocessor sees. Not only is this
internal form tree-structured, rather than linear, but there may be literal symbols
appearing in the internal form that are only implicit in the given definitions, such
as the symbol cons, which appears implicitly in the LISP.M expression (a b c).
72 10. Errors, Failures, Diagnostics
The user may also use the tree-structured form of the terms in his equations to ver-
ify that the matching of parentheses and brackets in his definitions agrees with his
original intent. To generate a tree-structured display of equations on the standard
output, type
es Equnsdir
es stands for equation show. Unfortunately, the more mnemonic abbreviations are
already used for other commands. es may only be used after running ep on the
same directory. The output from es lists the equations in the order given by the
user, with the sequential numbers used in error and failure reports from the prepro-
cessor. Each term in an equation is displayed by listing the symbols in the term in
preorder, and using indentation to indicate the tree structure. Variables on the
left-hand sides of equations are replaced by descriptions of their ranges, in pointed
brackets (<>), and variables on the right-hand sides are replaced by the
addresses of the corresponding variables on the left-hand sides. Representations of
predefined classes of equations are displayed, as well as equations given explicitly
by the user. For example, the following definitions
Symbols
fe 2:
h: 1;
include atomic_symbols.
For all x, y, z:
S(g(x, y), a) = h(y) where x is in atomic_symbols end where;
include equatom.
produce the listing:
10.5. Producing a Graphic Display 73
Listing of equational definitions processed on Apr 19 at 15:43
I:
a
<atomic_symbol>
<anything>
a
variable I 2
2
equ
<atomic_symbol>
<atomic_symbol>
e
variable |
variable 2
Notice that, on the left-hand side of equation 1, the variable x is replaced by
<atomic_symbol>, and the variable y is replaced by <anything>, representing
the fact that any term may be substituted for y. On the right-hand side, y is
replaced by
variable 1 2
indicating that the corresponding y on the left-hand side is the 2nd son of the Ist
son of the root of the term. The date and time at the top refer to the time of invo-
cation of ep. The user should check that this time agrees with his memory.
Changes to definitions after the given date and time are not reflected in the display.
10.6. Trace Output (et)
A primitive form of trace output is available, which displays for each reduction
step the starting term, the redex, the number of the equational rule applied, and
the reductum. In order to produce trace output, invoke the equation interpreter
14 10. Errors, Failures, Diagnostics
with the option ¢ as
ei Equnsdir t <input
where Equnsdir is the directory containing the equational definitions, and input is
a file containing the term to be reduced. Since ei uses purely positional notation
for its parameters, Equnsdir may not be omitted. The invocation of ei above pro-
duces a file Equnsdir/trace.inter containing a complete trace of the reduction of
the input term to normal form. To view the trace output on the screen, type
et Equnsdir
(Equnsdir defaults to .). et stands for equational trace. The trace listing is
headed by the date and time of the invocation of ei resulting in that trace. The
user should check that the given time agrees with his memory.
10.7. Miscellaneous Restrictions
Literal symbols are limited to arities no greater than 10, and all symbols are lim-
ited to lengths no greater than 20 in the current version.
11. History of the Equation Interpreter Project
The theoretical foundations of the project come from the dissertation "Reduction
Strategies in Subtree Replacement Systems," presented by Michael O’Donnell at
Cornell University in 1976. The same material is available in the monograph Com-
puting in Systems Described by Equations [O’D77]. There, the fundamental res-
trictions 1-4 on the left-hand sides of equations in Section 5 were presented, and
shown to be sufficient for guaranteeing uniqueness of normal forms. In addition,
outermost reduction strategies were shown to terminate whenever possible, and con-
ditions were given for the sufficiency of leftmost-outermost reductions. A proof of
optimality for a class of reduction strategies was claimed there, but shown incorrect
by Berry and Lévy [BL79, O’D79]. Huet and Lévy later gave a correct treatment
of essentially the same optimality issue [HL79].
In the theoretical monograph cited above, O’Donnell asserted that "a good
programmer should be able to design efficient implementations of the abstract com-
putations" described and studied in the monograph. In 1978, Christoph Hoffmann
and O’Donnell decided to demonstrate that such an implementation is feasible and
valuable. The original intent was to use the equations for formal specifications of
interpreters for a nonprocedural programming languages. For example, the equa-
tions that McCarthy gave to define LISP [McC60] could be given, and the equa-
tion processor should automatically produce a LISP interpreter exactly faithful to
those specifications. Preliminary experience indicated that such applications were
severely handicapped in performance. On the other hand, when essentially the
same computation was defined directly by a set of equations, the equation inter-
preter was reasonably competitive with conventional LISP. So, the emphasis of the
project changed from interpreter generation to programming directly with equa-
16 11. History of the Equation Interpreter
tions.
From early experience, the project goal became the production of a usable
interpreter of equations with very strict adherence to the semantics given in Section
1, and performance reasonably competitive with conventional LISP interpreters.
The specification of such an interpreter was given in [HO82b], and the key imple-
mentation problems were discussed there. Since the natural way of defining a sin-
gle function might involve a large number of equations, the second goal requires
that the interpreter have little or no runtime penalty for the number of equations
given. Thus, sequential checking for applicability of the first equation, then the
second, etc. was ruled out, and pattern matching in trees was identified as the key
algorithmic problem for the project. The overhead of pattern matching appears to
be the aspect of the interpreter that must compete with the rather slight overhead
of maintaining the recursion stack in LISP. Some promising algorithms for tree
pattern matching were developed in [HO82a].
In 1979 Giovanni Sacco, a graduate student, produced the first experimentally
usable version of the interpreter in CDC Pascal, and introduced some table-
compression techniques which, without affecting the theoretical worst case for pat-
tern matching, improved performance substantially on example problems.
Hoffmann ported Sacco’s implementation to the Siemens computer at the Univer-
sity of Kiel, Germany in 1980. Hoffmann and O’Donnell used Sacco’s implementa-
tion for informal experiments with LISP, the Combinator Calculus, and the
Lambda Calculus. These experiments led to the decision to emphasize program-
ming with equations over interpreter generation. These experiments also demon-
strated the inadequacy of any single notation for all problems, and motivated the
library of syntaxes provided by the current version. Another graduate student,
11. History of the Equation Interpreter 77
Paul Golick, transferred the implementation to UNIX on the VAX, and rewrote
the run-time portion of the interpreter (ei in the current version) in 1980. During
1982 and Spring of 1983, O’Donnell took over the implementation effort and pro-
duced the current version of the system. The final year of work involved informal
experiments with three different pattern matching techniques, and reconfiguration
of the implementation to allow easy substitution of different concrete syntaxes.
Experience with the interpreter comes from the interpreter implementation
itself, from two projects done in the advanced compiler course at Purdue Univer-
sity, and form a course in Logic Programming at the Johns Hopkins University.
O’Donnell used the equation interpreter to define the non-context-free syntactic
analysis for itself, gaining useful informal experience in the applicability of the
interpreter to syntactic problems. In 1982, Hoffmann supervised a class project
which installed another experimental pattern-matching algorithm in the interpreter,
and used the equation interpreter to define a Pascal interpreter. In 1983,
Hoffmann supervised another class project using the equation interpreter to define
type checking in a Pascal compiler. These two projects generated more information
on the suitability of various pattern-matching algorithms, and on the applicability
of equations to programming language problems. In 1983, O’Donnell assigned stu-
dents in a Logic Programming course to a number of smaller projects in equational
programming. One of these projects found the first natural example of a theoreti-
cal combinatorial explosion in one of the pattern-matching algorithms.
12. Low-Level Programming Techniques
Compared to the syntactic restrictions of conventional languages like PASCAL, the
restrictions on equations described in Section 5 are a bit subtle. We believe that
the additional study needed to understand the restrictions is justified for several
reasons. First, the restrictions are similar in flavor to those imposed by determinis-
tic parsing strategies such as LR and LALR, and perhaps even a bit simpler. The
trouble taken to satisfy the restrictions is rewarded by the guarantee that the
resulting program produces the same result, independently of the order of evalua-
tion. This reward should become very significant on parallel hardware of the
future, where the trouble of insuring order-independence in a procedural program
may be immense. Finally, there are disciplined styles of programming with equa-
tions that can avoid errors, and techniques for correcting the errors when they
occur. A few such techniques are given in this section; we anticipate that a sizable
collection will result from a few years of experience.
12.1. A Disciplined Programming Style Based on Constructor Functions
In many applications of equational programming, the function symbols may be par-
titioned into two classes:
1. constructor symbols, used to build up static data objects, and
2. defined symbols, used to perform computations on the data objects.
For example, in LISP M-expressions, the atomic symbols, nil, and the binary sym-
bol cons, are constructors, and all metafunction symbols are defined symbols.
Technically, a constructor is a symbol that never appears as the outermost symbol
on the left-hand side of an equation, and a defined symbol is one that does appear
as the outermost symbol on a left-hand side. The constructor discipline consists of
12.1. A Disciplined Style 719
never allowing a defined symbol to appear on a left-hand side, except as the outer-
most symbol. An equational program that respects the constructor discipline
clearly satisfies the nonoverlapping restriction 4 of Section 5.
Example 12.1.1
The following set of equations, in standard mathematical notation, does not respect
the constructor discipline, although it does not contain an overlap.
Symbols
fl;
g: 2;
include atomic_symbols.
For all x:
Sea, x)) = gla, f(x);
g(b, x) =a.
The symbol g is a defined symbol, because it appears outermost on the left-hand
side of the second equation, but g also appears in a nonoutermost position on the
left-hand side of the first equation. On the other hand, the following set of equa-
tions accomplishes the same result, but respects the constructor discipline.
For all x:
SAG) = gla, f));
g(a, x) = hG);
g(b, x) =a.
Here, h, a, and b are constructors, f and g are defined symbols. Neither f nor g
appears on a left-hand side except as the outermost symbol. The occurrences of f
and g on the right-hand sides are irrelevant to the constructor discipline.
0
80 12. Low-Level Programming Techniques
The constructor discipline avoids violations of the nonoverlapping restriction 4, but
it does not prevent violations of restriction 3, which prohibits two different left-
hand sides from matching the same term. For example, f(a,x)=a and f(x,a)=a
violate restriction 3, although the defined symbol f does not appear on a left-hand
side except in the outermost position.
"When the constructor discipline is applied, the appearance of a defined symbol
in a normal form is usually taken to indicate an error, either in the equations or in
the input term. Many other research projects, particularly in the area of abstract
data types, require the constructor discipline, and sometimes require that defined
symbols do not appear in normal forms [GH78]. The latter requirement is often
called sufficient completeness.
It is possible to translate every equational program satisfying restrictions 1-4
(i.e., the regular term reduction systems) into an equational program that respects
the constructor discipline. The idea, described by Satish Thatte in [Th85], is to
create two versions, f and f’, of each defined symbol that appears in a nonouter-
most position on a left-hand side. f remains a defined symbol, while f’ becomes a
constructor. Every offending occurrence of f (i.e., nonoutermost on a left-hand
side) is replaced by f’. In addition, equations are added to transform every f
that heads a subterm not matching a left-hand side into f".
Example 12.1.2
Applying the procedure described above to the first equational program in Example
12.1.1 yields the following program.
For all x:
S(g'(a,x)) = gla, f(x);
gla, x) = g'(a, x);
12.1. A Disciplined Style 81
glb, x) =a.
0
In the worst case, this procedure could increase the size of the program quadrati-
cally, although worst cases do not seem to arise naturally. At any rate, the con-
structor discipline should probably be enforced by the programmer as he programs,
rather than added on to a given program. In Section 12.4 we show how to use a
similar procedure to eliminate overlaps.
The constructor discipline is rather sensitive to the syntactic form that is actu-
ally used by the equation interpreter.
Example 12.1.3
Consider the following program, given in Lambda notation.
Symbols
AP: 2;
include atomic_symbols.
For all x, y:
(REFLECT (CONS x y)) = (CONS (REFLECT x) (REFLECT y));
(REFLECT x) = x where x is in atomic_symbols end where.
Recall that this is equivalent to the standard mathematical notation:
For all x, y:
AP(REFLECT, AP(AP(CONS, x), y)) =
AP(AP(CONS, AP(REFLECT, x)), AP(REFLECT, y));
AP(REFLECT, x) = x where x is in atomic_symbols end where.
82 12. Low-Level Programming Techniques
This program does not respect the constructor discipline, as the defined symbol AP
appears twice in nonoutermost positions in the left-hand side of the first equation.
As long as no inputs will contain the symbols REFLECT or CONS except applied
(using AP) to precisely one or two arguments, respectively, the same results may
be obtained by the following un-Curried program in standard mathematical nota-
tion.
For all x, y:
REFLECT(CONS (x, y)) = CONS(REFLECT(x), REFLECT(y));
REFLECT(x) = x where x is in atomic_symbols end where.
The last program respects the constructor discipline.
0
Example 12.1.4
Weak reduction in the combinator calculus [Sc24, St72] may be programmed in
Lambda notation as follows.
Symbols
AP: 2;
include atomic_symbols.
For all x, y, z:
(Sxy2=(xz 2);
(Kx) =x.
As in the first program of example 12.2.1, the constructor discipline does not hold,
because of the implicit occurrences of the defined symbol AP in nonoutermost posi-
tions of the first left-hand side. The left-hand sides may be un-Curried to
12.1. A Disciplined Style 83
S(x, y, z)
K(x)
The latter program respects the constructor discipline, with S and K being defined
symbols, and no constructors mentioned in the left-hand sides. The right-hand
sides cannot be meaningfully un-Curried, without extending the notation to allow
variables standing for functions.
0
One is tempted to take a symbol in Curried notation as a defined symbol when
it appears leftmost on a left-hand side of an equation. Unfortunately, this natural
attempt to extend the constructor discipline systematically to Curried notation fails
to guarantee the nonoverlapping property.
Example 12.1.5
In the following program, given in Lambda notation, the symbol P appears only
leftmost in left-hand sides of equations.
Symbols
AP: 2;
include atomic_symbols.
For all x, y, z:
(Px y) =Q;
(Pxya=R.
The two left-hand sides overlap, however, and (PQQQ) has the two different nor-
mal forms QQ and R.
0
Informally, the overlap violation above appears to translate to a violation of restric-
tion 3 in an un-Curried notation. Formalization of this observation would require a
84 12. Low-Level Programming Techniques
treatment of function symbols with varying arities. The appropriate formalization
for this case is not hard to construct, but other useful syntactic transformations
besides Currying may arise, and might require totally different formalisms to relate
them to the constructor discipline.
Because of the sensitivity of the constructor discipline to syntactic assump-
tions, and because the enforcement of this discipline may lead to longer and less
clear equational programs, the equation interpreter does not enforce such a discip-
line. Whenever a particular problem lends itself to a solution respecting the con-
structor discipline, we recommend that the programmer enforce it on himself, and
document the distinction between constructors and defined symbols. So far, most
examples of equational programs that have been run on the interpreter have
respected the constructor discipline, and the examples of nonoverlapping equations
not based on constructors have been few, and often hard to construct. So, experi-
ence to date fails to give strong support for the utility of the greater generality of
nonoverlapping equations. We expect that future versions of the interpreter will
enforce even weaker restrictions, based on the Knuth-Bendix closure algorithm
[KB70], and that substantial examples of programs requiring this extra generality
will arise. Further research is required to adapt the Knuth-Bendix procedure,
which was designed for reduction systems in which every term has a normal form,
to nonterminating systems.
12.2. Simulation of LISP Conditionals
The effort expended in designing and implementing the equation interpreter would
be wasted if the result were merely a syntactic variant of LISP. For the many
problems, and portions of problems, however, for which LISP-style programming is
appropriate, a programmer may benefit from learning how to apply an analogous
12.2. Simulation of LISP Conditionals 85
style to equational programming. The paradigm of LISP programming is the
recursive definition of a function, based on a conditional expression. The general
form, presented in informal notation, looks like
Six] = if P,[x] then E,
else if Pax] then Ez
else if P,[x] then E,
else En+1
where P,[x] is usually "x is nil", or occasionally "x is atomic", and P,--- ,P,
require more and more structure for x. In order to program the same computation
for the equation interpreter, each line of the conditional is expressed as a separate
equation. The conditions P,,--~-,P, are expressed implicitly in the structure of
the arguments to f on the left-hand sides of the equations, and occasionally in syn-
tactic restrictions on the variables. Since there is no order for the equations, the
effect of the order of conditional clauses must be produced by letting each condi-
tion include the negation of all previous conditions. As long as the conditions deal
only with the structure of the argument, rather than computed qualities of its
value, this translation will produce a more readable form than LISP syntax, and
the incorporation of negations of previous conditions will not require expansion of
the size of the program. The else clause must be translated into an equation that
applies precisely in those cases where no other condition holds. Expressing this
condition explicitly involves some extra trouble for the programmer, but has the
benefit of clarifying the case analysis, and illuminating omissions that might be
more easily overlooked in the conditional form. If the programmer accidentally
86 12. Low-Level Programming Techniques
provides two equations that could apply in the same case, the interpreter detects a
violation of restriction 3. If he neglects to cover some case, the first time that a
program execution encounters such a case, the offending application of f will
appear unreduced in the output, displaying the omission very clearly.
Example 12.2.1
Consider the following informal definition of a function that flattens a binary tree
into a long right branch, with the same atomic symbols hanging off in the same
order.
flatlk] = = ifxisatomic thenx
else if car[x] is atomic then conslcarlx]; flatlcdr[x]]]
else flatlcons{car[car[xI]; cons[cdrlcar[x]]; cdrfxIII]
The actual LISP program, using the usual abbreviations for compositions of car
and cdr, follows.
(DEF "(FLAT (LAMBDA (xX)
(COND
((ATOM X) Xx)
((ATOM (CAR X)) (CONS (CAR X) (FLAT (CDR X))))
eG (FLAT (CONS (CAAR X) (CONS (CDAR X) (CDR X)))))
The same computation is described by the following equational program, using
LISP.M notation.
Symbols
flat: 1;
cons: 2;
nil: 0;
include atomic_symbols.
For all x:
12.2. Simulation of LISP Conditionals 87
flat[x] = x where x is in atomic_symbols end where;
flatl(x . y)] = (x . flatly) where x is in atomic_symbols end where;
flatl((x . y). 2)] = flatlx . Gy. 2).
0
When conditions in a LISP program refer to predicates that must actually be com-
puted from the arguments to a function, rather than to the structure of the argu-
ments, the programmer must use a corresponding conditional function in the equa-
tional program. The translation from LISP syntax for a conditional function to the
LISP.M notation for the equation interpreter is utterly trivial.
12.3. Two Approaches to Errors and Exceptional Conditions
The equation interpreter has no built-in concept of a run-time error. There are
failures at run-time, due to insufficient space resources, but no exceptional condi-
tion caused by application of a function to inappropriate arguments is detected.
We designed the interpreter this way, not because we believe that run-time error
detection is undesirable, but rather because it is completely separated from the
other fundamental implementation issues. We decided to provide an environment
in which different ways of handling errors and exceptions may be tried, rather than
committing to a particular one. If errors are only reported to the outside world,
then the reporting mechanism is properly one for a syntactic
postprocessor. Certain normal forms, such as car[Q] in LISP, are perfectly
acceptable to the equation interpreter, but might be reported as errors when
detected by a postprocessor. Total support of this approach to errors may require a
mechanism for halting evaluation before a normal form is reached, but that
mechanism will be provided in future versions as a general augmentation of the
88 12. Low-Level Programming Techniques
interface (see Section 15.3), allowing the interpreter to act in parallel with other
components. No specific effort should be required for error detection.
If a programmer wishes to detect and react to errors and exceptional condi-
tions within an equational program, two basic strategies suggest themselves. In the
first strategy, exception detection is provided by special functions that inspect a
structure to determine that it may be manipulated in a certain way, before that
manipulation is attempted. In the other strategy, special symbols are defined to
represent erroneous conditions. Reaction to exceptions is programmed by the way
these special symbols propagate through an evaluation. We chose to provide a set-
ting in which many strategies can be tested, rather than preferring one.
Example 12.3.1
Consider a table of name-value pairs, implemented as a list of ordered pairs. The
table is intended to represent a function from some name space to values, but occa-
sionally certain names may be accidentally omitted, or entered more than once. In
a program for a function to look up the value associated with a given name, it may
be necessary to check that the name occurs precisely once. The following programs
all use LISP.M notation. The first applies the strategy of providing checking func-
tions. Efficiency has been ignored in the interest cf clarifying the fundamental
issue.
Symbols
cons: 2;
nil: 0;
occurs: 2;
legmap: 2;
lookup: 2;
add: 2;
equ: 2;
if: 3;
include atomic_symbols, integer_numerals, truth_values.
12.3. Two Approaches to Errors 89
For all m, n, 1, v:
occursim; O] = 0;
occursim; (n. v) . D] =
iflequlm; nl; addloccurstm; U; 1]; occurslm; II;
legmaplm,; I] = equloccurslm; U; 1;
ae (n.w.D] =
iflequlm; nl; v; lookuplm; UI;
ifltrue; m:n] =m; — iflfalse; m; n] = 1;
include equint, addint, equatom.
When lookup[m;/] is used in a larger program, the programmer will have to test
legmap[m;/] first, if there is any chance that m is not associated uniquely in /.
Essentially the same facility is provided by the following program, which applies
the strategy of producing special symbols to represent errors.
Symbols
cons: 2;
nil: 0;
lookup: 2;
undefined, overdefined: 0;
if: 3;
equ: 2;
include atomic_symbols, truth_values.
For all m, n, I, v:
lookuplm; O] = undefined[];
fooleuploas (a.w. DJ =
iflequlm;
‘Aleqallookn lm; U; undefined[]]; v; overdefinedll];
lookuplm; if:
ifltrue; m; n] = m; iflfalse; m; n] =n;
include equatom.
90 12. Low-Level Programming Techniques
0
Either strategy may be adapted to produce more or less information about the pre-
cise form of an error or exceptional occurrence. There appears to be no technical
reason to prefer one to the other. The choice must depend on the programmer’s
taste. It is probably foolish to mix the two strategies within one program.
12.4. Repairing Overlaps and Nonsequential Constructs
When a set of logically correct equations is rejected by the equation interpreter
because of overlapping left-hand sides, there are two general techniques that may
succeed in removing the overlap, without starting from scratch. The first, and sim-
plest, is to generalize the equation whose left-hand side applies outermost in the
offending case, so that the overlap no longer involves explicitly-given symbols, but
only an instance of a variable.
Example 12.4.1
Consider lists of elements in LISP.M notation, where a special element missing,
different from nil, is to be ignored whenever it appears as a member of a list.
Such a missing element would allow recursive deletion of elements from a list in
an especially simple way. The equation defining the behavior of missing may
easily overlap with other equations.
Symbols
cons: 2;
nil: 0;
missing: 0;
: mirror[l] is | concatenated with its own reversal.
mirror: 1;
include atomic_symbols.
12.4, Repairing Overlaps 91
For all 1, ll, x:
(missing. D = 1;
mirrorll] = appendll; reverselt]] where | is either 0
or (x ..11)
end or
end where;
The two equations in the fragment above overlap in the term
mirror((missing[] . 1)]
This overlap may be avoided by deleting the where clause on the second equation,
and allowing the mirror operation to apply to elements other than lists, perhaps
with meaningless results. Of course, the first equation will certainly overlap with
other equations defining, for example, reverse and append, and these overlaps will
require other avoidance techniques.
0
The technique of generalization, shown above, only works when unnecessarily
restrictive equations have led to overlap. More often, overlaps may be removed by
restricting the equation whose left-hand side appears innermost in the offending
case, either by giving more structure to the left-hand side, or by adding extra sym-
bols to differentiate different instances of the same operation. The second method
seems to be unavoidable in some cases, but it regrettably decreases the readability
and generality of the program.
Example 12.4.2
Consider an unusual setting for list manipulation, based on an associative concate-
nation operator cat, instead of the constructor cons of LISP. An atomic symbol a
is identified with the singleton list containing only a. The following program, given
92 12, Low-Level Programming Techniques
in standard mathematical notation, enforces the associativity of cat by always asso-
ciating to the right, and defines reversal as well.
Symbols
cat: 2;
reverse: 1;
include atomic_symbols.
For all x, y, z:
cat(cat(x, y), z) = cat(x, cat(y, z))
where x is in atomic_symbols end where;
reverse(cat(x, y)) = cat(reverse(y), reverse(x));
reverse(x) = x where x is in atomic_symbols end where.
The restriction of x in the first equation to be an atomic symbol prevents a self-
overlap of the form cat (cat (cat(A,B),C),D), but there is still an overlap between
the first and second equations in the form reverse (cat (cat(A,B),C)). The same
effect may be achieved, without overlap, by restricting the variable x to atomic
symbols in the second equation as well. This correction achieves the right output,
but incurs a quadratic cost for reverse, because of the reassociation of cats implied
by it. See Section 16.1 for a more thorough development of this novel approach to
list manipulation, achieving the linear reversal cost that was probably intended by
the program above.
0
Example 12.4.3
In order to approach an implementation of the lambda calculus, one might take the
Curried notation in which binary application is the only operation, and add a syn-
tactic substitution operator. The following program defines substitution, and for
12.4. Repairing Overlaps 93
the sake of simplicity only the identity function, using standard mathematical nota-
tion. subst(x,y,z) is intended to denote the result of substituting x for each
occurrence of y in z.
Symbols
include atomic_symbols, truth_values.
For all w, x, y, 2:
AP(IO, x) = x;
subst(w, x, y) = iflequ(x, y), w, y)
where y is in atomic_symbols end where;
subst (w, x, AP(y, 2)) = AP(subst(w, x, y), subst(w, x, 2));
if(true, x, y) = x; if(false, x, y) = false;
include equatom.
The first and third equations overlap in the form subst(A,B,AP(I,C)). In order
to avoid this overlap, change nonoutermost occurrences of the symbol AP on left-
hand sides (there is only one in this example, in the third equation) into a new
symbol, JAP (Inert APplication). Two additional equations are required to con-
vert AP to JAP when appropriate.
For all w, x, y, z:
AP(IO, x) = x;
subst(w, x, y) = iflequ(x, y), w, y)
where y is in atomic_symbols end where;
subst(w, x, IAP(y, z)) = AP(subst(w, x, y), subst(w, x, z));
94 12. Low-Level Programming Techniques
AP(x, y) = IAP(x, y) where x is in atomic_symbols end where;
AP(IAP(x, y)) = IAP(IAP(x, y));
if(true, x, y) = x; if(false, x, y) = false;
include equatom.
Notice that JAP is never used on the right-hand side. Essentially, the use of [AP
enforces innermost evaluation in those cases where left-hand sides used to overlap.
The technique of adding symbols to avoid overlap, illustrated in Example 12.4.3, is
essentially the same idea as that used by Thatte [Th85] to translate nonoverlapping
equations into the constructor discipline (see Section 12.1). Example 16.3.3 shows
how the same technique was used independently by Hoffmann, who thought of
overlap as a potential inconsistency in a concurrent program, and used locks to
avoid the inconsistency.
When a logically correct set of equations is rejected by the equation inter-
preter because of a failure of left-sequentiality, the first thing to try is reordering
the arguments to offending functions. If the program is part of a polished product,
the reordering may be accomplished in syntactic pre- and postprocessors. If the
trouble of modifying the syntactic processors is too great, the user, regrettably,
must get accustomed to seeing the arguments in the new order.
Example 12.4.4
Consider a program generalizing the LISP functions car and cdr to allow an arbi-
trarily long path to a selected subtree. select[t;p] is intended to select the subtree
of t reached by the path p from the root of t. Paths are presented as lists of
atomic symbols L and R, representing left and right branches, respectively.
12.4. Repairing Overlaps 95
Symbols
cons: 2;
nil: 0;
select: 2;
include atomic_symbols.
For all x, y, p:
selectlx; 0] = x;
selectl(x . y); (L. p)] = selectlx; pl;
selectl(x . y); (R.. p)] = selectly; pl.
Left-sequentiality fails for the program above because, after seeing the symbol
select, there is no way to decide whether or not to inspect the first argument, seek-
ing a cons, or the second argument, seeking (). Only after seeing the second argu-
ment, and determining whether or not it is Q, can the interpreter know whether or
not the first argument is relevant. Left-sequentiality is restored merely by revers-
ing the arguments to select.
For all x, y, p:
select[0; x] = x;
selectI(L . p); (x . y)] = selectlp; x];
selectI(R . p); (x . y] = selectlp; yl.
0
Failures of left-sequentiality may also be repaired by artificially forcing the
interpreter to inspect an argument position, by replacing the variable with a dis-
junction of all possible forms substituted for that variable. This technique degrades
the clarity of the program, and risks the omission of some possible form. In the
96 12. Low-Level Programming Techniques
worst case, the forced inspection of an ill-defined argument could lead to an
unnecessary infinite computation.
Example 12.4.5
The first program of Example 12.4.4 may also be repaired by replacing the first
equation,
selectlx; 0] = x;
with the two equations,
select[x; 0] = x where x is in atomic_symbols end where;
selectl(x . y); O] = (x. y);
or, equivalently, by qualifying the variable x to be
either (y . z) or in atomic_symbols end or
0
Whenever permutation of arguments restores left-sequentiality, that method is pre-
ferred to the forced inspection. In some cases, however, argument permutation
fails where forced inspection succeeds.
Example 12.4.6
The parallel or function may be defined by
Symbols
or: 2;
include truth_values.
For all x:
or(true, x) = true;
or(x, true) = true;
12.4. Repairing Overlaps 97
or(false, false) = false.
Left-sequentiality fails because of the first two equations. If the second equation is
changed to
or(false, true) = true;
then left-sequentiality is restored. The or operator in the modified equations is
sometimes called the conditional or. The results are the same as long as evalua-
tion of the first argument to or terminates, but arbitrarily much effort may be
wasted evaluating the first argument when the second is true. In the worst case,
evaluation of the first argument might never terminate, so the final answer of true
would never be discovered.
0
13. Use of Equations for Syntactic Manipulations
While computer programming in general remains something of a black art, certain
special problem areas have been reduced to a disciplined state where a competent
person who has learned the right techniques may be confident of success. In par-
ticular, the use of finite-state lexical analyzers and push-down parses, generated
automatically from regular expressions and context-free grammars, has reduced the
syntactic analysis of programming languages, and other artificially designed
languages, to a reliable discipline [AU72]. The grammatical approach to syntactic
analysis is also beneficial because the grammatical notation is reasonably self-
documenting - sufficiently so that the same grammar may be used as input to an
automatic parser generator, and as appendix to a programmer’s manual, providing
a reliable standard of reference to settle subtle syntactic issues that are not
explained sufficiently in the text of the manual. Beyond context-free manipula-
tions, the best-known contender for formal generation of language processors is the
attribute grammar [Kn68, AU72]. An attribute grammar is a context-free gram-
mar in which each nonterminal symbol may have attribute values associated with
it, and each production is augmented with a description of how attributes of non-
terminals in that production may be computed from one another. Although they
have proved very useful in producing countless compilers, interpreters, and other
language processors, attribute grammars have not provided a transparency of nota-
tion comparable to that of context-free grammars, especially when, as is usually the
case, actual computation of one attribute from another is described in a conven-
tional programming language, such as C.
Linguists who seek formal descriptions of natural languages have tried to
boost the power of context-free grammars with transformational grammars
13. Use for Syntactic Manipulations 99
[Ch65]. A transformational grammar, like an attribute grammar, contains a
context-free grammar, but the context-free grammar is used to produce a tree,
which is then transformed by schematically presented transformation rules. The
parse tree produced by the context-free grammar is called the surface structure,
and the result of the transformations is called the deep structure of an input string
of symbols. A number of different formal definitions of transformational grammar
have been proposed, all of them suffering from complex mechanisms for controlling
the way in which tree transformations are applied. We propose the equation inter-
preter as a suitable mechanism for the transformational portion of transformational
grammars. By enforcing the confluence property, the equation interpreter finesses
the complex control mechanisms, and returns to a notation that has the potential
for self-documenting qualities analogous to those of context-free grammars. The
concepts of surface structure, which captures the syntactic structure of the source
text, without trivial lexical details, and deep structure, which is still essentially syn-
tactic, but which captures the structure of the syntactic concepts described by the
source text, rather than the structure of the text itself, appear to be very useful
ones in the methodology of automatic syntactic analysis. The analysis into surface
and deep structures should be viewed as a refinement of the idea of abstract syn-
tax - syntax in tree form freed of purely lexical issues - which has already proved a
useful organizing concept for language design [McC62, La65].
In this section, we propose that the concept of abstract syntax be made an
explicit part of the implementation of syntactic processors, as well as a design con-
cept. Rather than extend the traditional formalisms for context-free grammars, we
develop a slightly different notation, clearly as strong in its power to define sets of
strings, and better suited to the larger context of regular/context-free/equational
100 13. Use for Syntactic Manipulations
processing.
13.1 An Improved Notation for Context-Free Grammars
The tremendous success of context-free grammars as understandable notations for
syntactic structure comes from their close connection to natural concepts of type
structure. By making the connection even closer, we believe that clarity can be
improved, and deeper structural issues separated more cleanly from superficial lexi-
cal ones. The first step is to start, not with source text, which is a concrete realiza-
tion of a conceptual structure, but with the conceptual structure itself. By "concep-
tual structure," we mean what a number of programming language designers have
called abstract syntax [McC62, La65], and what Curry and Feys called formal
objects or Obs [CF58]. We do not intend meanings or semantic structures: rather
abstract, but syntactic, mental forms that are represented straightforwardly by
source texts. Just as programming language design should start with the design of
the abstract syntax, and seek the most convenient way to represent that abstract
syntax in text, the formal description of a language should first define the abstract
structure of the language, then its concrete textual realization. Abstract syntaxes
are defined by type assignments.
Definition 13.1.1
Let 2 be an alphabet on which an abstract term is to be built, and let Ig be a set
of primitive types used to describe the uses of symbols in 2.
A flat type over To is either a primitive type t €T, or
5,)X% °° Xs, —t where s),°° + S,,t€T.
The set of all flat types over Tg is called I.
A type assignment to 2 is a binary relation TC DX.
13.1. An Improved Notation 101
For f €2, when 7 is understood, f:t,,°** ,f, means that frt; for i€f1,nJ. Usually
7 is a function, so n=1.
The typed language of & and 1, denoted Z,, is the set of all terms built from sym-
bols in 2, respecting the types assigned by r. Formally,
If (f,t) €7 for #€1p then f is in the typed language of 2 and 1, and f has type ¢
(i.e., f is a constant symbol of type t).
If E,,°*:,£, are in the typed language of 2 and 7, with types s),°-~ ,s,€I'g, and
"if fis,% +++ Xs, et, then f(E,,---,£,) is in the typed language of Z and 7 with
type ¢.
r is extended so that E7t, also written E:t, if ¢ is the type of an arbitrary term
E€2,.
0
f(E,,°°+,£,) denotes an abstract term, or tree, with f at the head or root, and
subterms E,,--:,£, attached to the root in order. In particular, f(E,, °° - ,E,)
does not denote a particular string of characters including parentheses and com-
mas.
Abstract syntaxes are merely the typed languages of typed alphabets as
described above. These languages might naturally be called regular tree
languages, because they can be recognized by nondeterministic finite automata
running from the root to the leaves of a tree, splitting at each interior node to cover
all of the sons (a different next state may be specified for each son), and accepting
when every leaf yields an accepting state [Th73]. All of the technical devices of
this section are found in earlier theoretical literature on tree automata and gram-
mars [Th73], but we believe that the particular combination here is worthy of con-
sideration for practical purposes.
102 13. Use for Syntactic Manipulations
Type assignments are almost context-free grammars, they merely lack the ter-
minal symbols. Each primitive type ¢ may be interpreted as a nonterminal symbol
N, in the context-free grammar, and the type assignment f:s|X--+ Xs, —¢t
corresponds to the production N, —*N,,:*:N,. Notice that the type assignment
has one piece of information lacking in the context-free production - the name /.
Names for context-free productions clarify the parse trees, and make issues such as
ambiguity less muddy. The terminal symbols, giving the actual strings in the
context-free language, are given separately from the type assignment. This separa-
tion has the advantage of allowing different concrete notations to be associated
with the same abstract syntax for different purposes (e.g., internal storage, display
on various output devices - the translation from a low-level assembly code to binary
machine code might even be represented as another notation for the assembly
code). Separation also clarifies the distinction between essential structure and
superficial typography. We do not suggest that typography is unimportant com-
pared to the structure, merely that it is helpful to know the difference.
The following definition introduces all of the formal concepts needed to define
context-free languages in a way that separates essential issues from typographical
ones, and that makes the relationship between strings and their abstract syntax
trees more explicit and more flexible than in traditional grammatical notation. The
idea is to define the abstract syntax trees first, by introducing the symbols that may
appear at nodes of the trees, and assigning a type to each. Legitimate abstract
syntaxes are merely the well-typed terms built from those symbols. Next, each
symbol in the abstract syntax is associated with one or more notational schemata,
showing how that symbol is denoted in the concrete syntax, as a string of charac-
ters. Auxiliary symbols may be used to control the selection of a notational
13.1. An Improved Notation 103
schema when a single abstract symbol has several possible denotations. This con-
trol facility does not allow new sets of strings to be defined, but it does allow a
given set of strings to be associated with abstract syntax trees in more ways than
can be done by traditional grammar-driven parsing.
The design of this definition was determined by three criteria.
1. The resulting notation must handle common syntactic examples from the com-
piler literature in a natural way.
2. The translations defined by the notation must be closed under homomorphic
(structure-preserving) encodings of terms. For example, if it is possible to
make terms built out of the binary function symbol f correspond to strings in
a particular way, then the same correspondence must be possible when f(x,y)
is systematically encoded as apply (apply (f,x),y), an encoding called Curry-
ing. If this criterion were not satisfied, a user might have to redesign the
abstract syntax in order to achieve typographical goals. The whole point of
the new notation is to avoid such interference between levels.
3. A natural subset of the new notation must equal the power of
syntax —directed translation schemata [Ir61, LS68, AU72].
Definition 13.1.2
Let cat be a binary symbol indicating concatenation. cat(a,8) is abbreviated af.
Let V={x),x,°--} be a set of formal variables, VN Z=¢. Let A be a finite alpha-
bet of auxiliary symbols, S€A a designated start symbol. A notational
specification for a type assignment 7 to abstract alphabet 2 in concrete alphabet
2 with auxiliary alphabet A is a binary relation
nS (((ZUV),U {empty}) xd) x ({cat} U Q* U(VxA))y, where 7’ is r augmented so
that each variable in V has every type in Tp (i-e., a variable may occur anywhere in
104 13. Use for Syntactic Manipulations
a term), and y is the type assignment such that each word in 0*, and each anno-
tated variable in VXA, is a single nullary symbol of type STRING, and cat is of
type STRINGXSTRING —STRING. empty denotes the empty term, without
even a head symbol. Without loss of generality, the variables on the left-hand term
in the 7 relation are always x),°°*,X,, in order from left to right. Notational
specifications are described by productions of the form
E4 is denoted F,
indicating that <E,A>nF. Within F, the auxiliary symbols are given as super-
scripts on the variables. When the same pair <E,A> is related by 7 to several
expressions F,,°*- ,F,,, the m elements of the relation are described in the form
E4 is denoted E, or «++ or E,,. Multiple superscripts indicate that any of the
superscripts listed may be used. If only one element of A may appear on a particu-
lar symbol, then the superscript may be omitted.
A context—free notational specification is one in which the 7 relation is restricted
so that when <E,A>nF, no variable occurs more than once in F.
A simple notational specification is a context-free notational specification in which
the » relation is further restricted so that when <E,A>nF, the variables
X4,°°*,X, occur in order from left to right in E, possibly with omissions, but
without repetitions.
Each notational specification » defines an interpretation nG (Z,xA) x Q*, associat-
ing trees (abstract syntax) with strings of symbols (concrete syntax). The interpre-
tation is defined by
<E,A>na => <E,A>7a when E contains no variables.
<E[x1, °° + X_1,4 >nal<x),B)>,°°*,<x,,B,>1&
<E,B,>78, & °** & <EmsBm>7Bm &
13.1. An Improved Notation 105
<empty Brnai>mBm+ & °° * & <empty,B,>7B,
=> f(E),°** ,E,,)nalB,, °° By]
Fra abbreviates <F,S >na, where S is the start symbol. _
The string language of a notational specification » at type ¢ is
{ee 2*| FED, (Ft) €7 & Fra.
0
The formal definition above is intuitively much simpler than it looks. Notational
specifications give nondeterministic translations between terms and strings in a
schematic style using formal variables. The intent of a single production is that
the terms represented by the variables in any instance of that production should be
translated first, then those translations should be combined as shown in the produc-
tion. In most cases, the left-hand side of a production is of the simple form
f(x,,°°+,x,). This form is enough to satisfy design criterion 1 above. The more
complex terms allowed on left-hand sides of productions are required to satisfy cri-
terion 2, and the empty left-hand sides are required for criterion 3. Greater
experience with the notation is required in order to judge whether the added gen-
erality is worth the trouble in understanding its definition.
The string languages of context-free notational specifications and simple nota-
tional specifications are precisely the context-free languages, but notational
specifications offer more flexibility than context-free grammars for defining tree-
string relations. Since the relation between the parse tree and a string has a prac-
tical importance far beyond the set of parsable strings, this added flexibility
appears to be worth the increase in complexity of the specifications, even if it is
never used to define a non-context-free language. The type-notation pairs defined
here are more powerful than the syntax—directed translation schemata of Aho
106 13. Use for Syntactic Manipulations
and Ullmann [AU72] when the abstract-syntax trees are encoded as strings in pos-
torder. Context-free notational specifications are equivalent in power to syntax-
directed translation schemata, and simple notational specifications are equivalent in
power to simple syntax—directed translation schemata and to pushdown trans-
ducers. Independently of the theoretical power involved, the notation of this sec-
tion should be preferred to that of syntax-directed translation schemata, since it
makes the intuitive tree structure of the abstract syntax explicit, rather than encod-
ing it into a string. Notice that auxiliary symbols are equivalent to a restricted use
of inherited attributes in attribute grammars. Since these attributes are chosen
from a finite alphabet, they do not increase the power of the formalism as a
language definer.
A notational specification is generally not acceptable unless it is complete,
that is, every abstract term in the typed language has at least one denotation.
Completeness is easy to detect automatically. Unfortunately, ambiguity, that is,
the existence of two or more denotations for the same abstract term, is equivalent
to ambiguity in context-free grammars, which is undecidable. For generating
parsers, ambiguity is deadly, since the result of parsing is not well-defined. Just as
with context-free grammars, we should probably enforce stronger decidable
sufficient conditions for nonambiguity. For unparsers, ambiguity is usually undesir-
able, but it might be occasionally useful when there is no practical need to distin-
guish between two abstract terms.
Every context-free grammar decomposes naturally into a type assignment and
a simple context-free notational specification, with a trivial A, and 7 a function.
The natural decomposition is unique except for ordering of the arguments to each
function symbol, and choice of names for the function symbols. Similarly, every
13.1. An Improved Notation 107
context-free type-notation pair defines a unique natural context-free grammar.
Example 13.1.1
Consider a context-free grammar for arithmetic expressions.
S—N |
S—-S+S |
S S+S
S—(S)
N —0|1[2/3]415]6|7/8|9
N—-NO|N1|--- |N9
This grammar decomposes naturally into the type assignment
numeral:N —-S
plus:SxS —-S
times:S XS —S I
paren:S —S H
digitO:N --- digit9:N
extend0:N —-N --+ extend9:N —N
and the context-free notational specification
numeral (x) is denoted x,
plus (x ,x2) is denoted x,+x»
times(x ,,x2) is denoted x\*x
paren(x,) is denoted (x,)
digitO is denoted 0 --- digit9 is denoted 9
extendO(x,) is denoted x,0 --- extend9(x,) is denoted x,9
108 13. Use for Syntactic Manipulations
The type-notation pair derived above is not the most intuitive one for arithmetic
expressions, because the paren symbol has no semantic content, but is an artifact
of the form of the grammar. To obtain a more intuitive notational specification,
for the type assignment that omits paren, delete the paren line from the notational
specification above, and replace the plus and times lines by the following.
plus (x,,x) is denoted x,+x2 or (x,+x»)
times (x,x2) is denoted x \*Xx» or (x1*x)
The auxiliary alphabet A is still trivial, but the notational relation 7 is no longer a
function.
0
The auxiliary symbols in A are not strictly necessary for defining an arbitrary
context-free language, but they allow technical aspects of the parsing mechanism to
be isolated in the notational specification, instead of complicating the type assign-
ment. For example, the extra nonterminal symbols often introduced into grammars
to enforce precedence conditions should not be treated as separate types, but
merely as parsing information attached to a single type.
Example 13.1.2
The grammar of Example 13.1.1 is ambiguous. For example, 1+2*3 may be
parsed as plus (1,times (2,3)), or times (plus (1,2),3). The usual way to avoid the
ambiguity in the context-free grammar is to expand the set of nonterminal symbols
as follows.
S-S+T
S—-T
T —T+#R
13.1. An Improved Notation 109
TR
R—-(S)
R-N
N —0|1]2/3]4|5|6]7/8]9 |
N—NO|N1|++° |N9
This grammar gives precedence to * over +, and associates sequences of the same
operation to the left. The direct translation of this grammar yields the following
type assignment and notational specification.
plus:SxT —-S
summand:T +S
times:T XR —-T
multiplicand:R —-T
paren:R —S
numeral:N —R
digit0:N +++ digit9:N
extend0:N —-N +++ extend9:N —-N
plus (x ,x2) is denoted x,+x»
summand (x) is denoted x,
times (x 1,2) is denoted x *x
multiplicand (x,) is denoted x,
paren(x,) is denoted (x)
numeral (x,) is denoted x,
digitO is denoted 0 --+ digit9 is denoted 9
110 13. Use for Syntactic Manipulations
extend0(x,) is denoted x,0 +++ extend9(x,) is denoted x,9
As well as paren, we now have the semantically superfluous symbols summand and
multiplicand, and types T and R that are semantically equivalent to S. In order
to keep the abstract syntax of the second part of Example 13.1.1, while avoiding
ambiguity, the parsing information given by these semantically superfluous symbols
and types should be encoded into auxiliary symbols $,7,R as follows.
plus:SxS —-S
times:S XS —-S
numeral:N —-S
digitO:N --+ digit9:N
extend0:N —-N +++ extend9:N —-N
plus (x,,x2)5 is denoted xf+xJ or (x$+x7)
plus (x,x2)7® is denoted (x$+x7)
times (x1,x2)5"" is. denoted x]ex® or (xT*x®)
times (x,,x2)* is denoted xT «xP
numeral (x,)5\™ is denoted x,
digitO is denoted 0 +++ digit9 is denoted 9
extend0(x)) is denoted x0 «++ extend9(x,) is denoted x,9
0
Non-context-free notational specifications allow matching of labels at the
beginning and end of bracketed sections. For example, to define procedure
definitions in a conventional programming language so that the name of a pro-
13.1. An Improved Notation 111
cedure appears at the end of its body, as well as in the heading, use the following
notational specification:
procedure (x ,X,x3) is denoted PROCEDURE x(x );x3 END{x,)
Given current software that is available off-the-shelf, the best thing to do with
a context-free type-notation pair is probably to convert it into a context-free gram-
mar, and apply the usual parsing techniques. The new notation does not show to
greatest advantage under such usage, since the tricks that are required to avoid
parsing conflicts may complicate the grammar by introducing otherwise unneces-
sary auxiliary symbols. It is probably not a good idea to try to parse with non-
context-free type-notation pairs, although unparsing is no problem.
The ideal application of type-notation pairs depends on future availability of
structure—editor generators, as an alternative to parsers [MvD82, Re84]. Struc-
ture editors are fruits of the observation that source text was developed for particu-
lar input techniques, such as the use of punched cards, involving substantial off-line
preparation before each submission. During the off-line work, a user needs to work
with a presentation of his program that is simultaneously machine-readable and
human-readable. With highly interactive input techniques, based on video termi-
nals, there is no more requirement that what the user types correspond character
by character to what he sees on the screen. Structure editors let short sequences of
keystrokes produce simple manipulations of tree structures (i.e., abstract syntaxes),
and instantly display the current structure on the screen. The process that must
be automated is unparsing, a process technically much simpler than parsing, and
immune to the problems of nondeterminism that arise in general context-free pars-
ing. Type assignments are ideal for defining the internal structures to be manipu-
lated by structure editors. The notational components still need strengthening to
EE I
112 13. Use for Syntactic Manipulations
deal with the two-dimensional nature of the video display. Unparsing is easy even
for non-context-free type-notation pairs.
13.2. Terms Representing the Syntax of Terms
When equations are used to perform syntactic processing, we often need a way to
distinguish different levels of meaning. For example, there is no way to write an
equational program translating an arbitrary list of the form (F A, -:: A,) to
F[A,,:°*+,A,]. We might be tempted to write
translatel(x . y)] = x[transargsly]],
but F is a nullary symbol in the first instance, and an n-ary function symbol in the
second. Also, the use of the variable x in place of a function symbol on the right-
hand side is not allowed. Yet, the translation described above is a very natural
part of a translation of LISP programs into equational programs. The problem is,
that the symbol translate is defined above as if it operates on the objects denoted
by S-expressions, when, in fact, it is supposed to operate on the expressions them-
selves. Consider the further trouble we would have with a syntactic processor that
counts the number of leaves in an arithmetic expression. We might write
count_leaves (add (x, y)) = add (count_leaves (x), count_leaves (y)).
Technically, this equation overlaps with the defining equations for add. Intuitively,
it is a monster.
To achieve sufficient transformational power for the LISP example, and to
avoid the confusion of the add example, we need a notation for explicitly describ-
ing other notations -- terms denoting terms. As long as we accept some equations
on terms, we are in trouble letting terms stand for themselves. Rather, we need
13.2. Terms Representing Syntax 113
explicit functions whose use is to construct terms. One natural set of functions for
this purpose uses list notation based on nil and cons, as in LISP, plus
atomic_symbols as names of symbols within the terms being described, unary func-
tions litsym, atomsym, intnum, truthsym, and char to construct symbols of vari-
ous types from their names, and multiap to apply a function symbol syntactically
to a list of argument terms.
Example 13.2.1
Using the notation defined above, we may translate LISP S-expressions to the func-
tional terms that they often represent. We —i must translate
multiap[litsymI[cons]; (atomsym[F]. --+)] to multiap[litsym[F]; (--+)]. This
may be accomplished naturally by the following equational program.
Symbols
: Constructors for terms
nil: 0;
cons: 2;
litsym, atomsym, intnum, truthsym, char: 1;
multiap: 2;
include atomic_symbols;
: Translating operators
translate: 1;
transargs: 1.
For all x, y:
translatellitsym[x]] _= litsymlx]:
translatelatomsymlx]]_ = atomsymlx];
translatelintnum[x]] _ = intnum[x];
translateltruthsym[x]] = truthsymIx];
translatelchar[x]]_ — = charlx];
translatelmultiapllitsymlcons]; (atomsymIx] . y)]] =
multiapllitsymlx]; transargslyll;
transargs[O] = QO;
transargsl(x . y)] = (translatelx] . transargsly)).
114 13. Use for Syntactic Manipulations
Similarly, the leaf counting program becomes
Symbols
: Constructors for terms
nil: 0;
cons: 2;
litsym, intnum: 1;
multiap: 2;
‘include atomic_symbols;
: Counting operator
count: I;
: Arithmetic operator
ada: 2.
For all x, y:
countlintnum[x]] = 1;
count[multiapllitsymladd]; (x y)]] = addlcount[x]; countlyl].
0
The examples above are not appealing to the eye, but the potential for confusion is
so great that precision seems to be worth more than beauty in this case. Special
denotations involving quote marks might be introduced, but they should be taken as
abbreviations for an explicit form for denoting syntax, such as the one described
above.
In order to apply an equational program to perform a syntactic manipulation,
the term to be manipulated should be in an explicitly syntactic form. Yet, the ini-
tial production of that term as input, and the final output form for it, are unlikely
to be explicitly syntactic themselves. For example, one use of the translation of S-
expressions to functional forms is to take an S-expression, translate it into a func-
tional form, evaluate the functional form, then translate back to an S-expression.
The user who presents the initial S-expression only wants to have it evaluated, so
13.2. Terms Representing Syntax 115
he should not need to be aware of the syntactic translations, and should be allowed
to present the S-expression in its usual form. In order to evaluate the functional
form, it must not be given as explicit syntax, else it will not be understood by an
evaluator of functional forms. From these considerations, we seem to need
transformers in and out of explicit syntactic forms. These translators cannot be
defined by the equation interpreter, without getting into an infinite regress, since
the transformation of a term into its explicit syntactic form is itself a syntactic
transformation. So, the implementation of the equation interpreter includes two
programs called syntax and content. syntax transforms a term into its explicit
syntactic form, and content transforms an explicit syntactic form into the term that
it denotes. Thus, the syntax of (A . B) is
multiap[litsymI[cons]; (atomsym[A] atomsym[B))],
and the content of
multiap[litsym[f 1; (atomsym[A] intnum[22] atomsym[B))]
is f[A; 22; B].
13.3. Example: Type-Checking in a Term Language
A typical problem in non-context-free syntactic processing is type checking, where
the types of certain symbols are given by declarations lexically distant from the
occurrences of the symbols themselves. While the most popular application of type
checking occurs in processing conventional programming languages, such as Pascal,
the essential ideas can be understood more easily by considering a simpler
language, consisting of declarations followed by a single term. An advanced
compiler class taught by Christoph Hoffmann constructed a full type checker for
Pascal in 1983. Substantial extra effort was required to enforce declare-before-use
Ee
116 13. Use for Syntactic Manipulations
and similar restrictions included in the design of Pascal mostly to simplify conven-
tional compiler implementations.
We present the term language in the type-notation form of Section 13.1, and
assume that some mechanism is already available to translate from concrete syntax
to abstract syntax. The type S denotes a list of declarations paired with a term, E
denies a term, D a declaration, A an atomic symbol and T a type. EL, DL, and
TL denote lists of declarations, terms, and types, respectively. S, E, and T are
also used as auxiliary symbols, corresponding intuitively to their uses as types. The
auxiliary symbol P indicates a type occurring within the left-hand side of a func-
tional type, and so needing to be parenthesized if it is a functional type itself, EC
and DC indicate nonempty lists of terms and declarations, respectively.
typed_term:DLXE —S
cons:DXDL —DL, EXEL ~EL, TXTL —TL
nil:DL,EL,TL
declaration:A XT —D
type:A —T
function:T XTL -T
term:A -E
multiap:E XEL -E
typed_term(dl,e) is denoted dl? .e
’
13.3. Example: Type-Checking 117
cons (d,dl)°© is denoted d; dl or d dl%
declaration (a,t) is denoted a:t™
type(a)™? is denoted a
function (t,tl)7 is denoted tI™T® —1
function (ttl)? is denoted (tIT —1)
cons (t,tl)™ is denoted t?xtIT© or t t1N
term(a)®-P is denoted a
multiap (e,el)® is denoted e? (el=)
multiap (e,el)? is denoted (e?) (el®°)
cons (e,el)®© is denoted e,el=© or e elN
nil® is denotede
The operator multiap above applies a function, which may be given by an arbi-
trarily complex term, to a list of arguments. term constructs a primitive term, and
type a primitive type, from an atomic symbol that is intended to be the name of
the term or type. function(t,tl) represents the type of functions whose arguments
are of the types listed in t/, and whose result is of type t. The denotational
specifications above produce the minimum parenthesization needed to avoid ambi-
guity. The empty list is denoted by the empty string. A typical element of the
language above is given by the concrete form
118 13. Use for Syntactic Manipulations
fuxt ot;
giltxt 1) xt >t 1;
ait.
Sf (a,(gYf,a)) (a))
and the abstract form, given in LISP.M notation,
typed_term[
: declaration|f ;function[type(t],(typelt] typelt))]]
Heilafation [
Fanction|
function(typelt],(type(t))];
, Wametont pelt Gipelil” type[t])],type(t])
i:
declaration|a,typel(t]]
multiap[
Re if);
termla]
multiap{
multiaplterml[g],(term[f] termla))];
termla
Although the presentation of the abstract syntax above is not very readable for
even moderate sized declarations and terms, it has the flexibility needed to describe
a wide class of computations on the terms. The substantial ad hoc extensions to
the concrete syntax that would be required to denote portions of well-formed
expressions, and to present equations with variables, would end up being more
13.3. Example: Type-Checking 119
confusing than the general abstract form. In particular, the distinction between
semantic function application, involving the operators performing the type check-
ing, and syntactic function application in the term being checked, requires some
departure from conventional concrete notation. So, even though it is a very poor
display notation, the abstract syntax may be a good internal notation for defining
computations.
The following equational program uses the LISP.M notation to define type
checking for the typed terms described above. We assume that equations are given
defining operations on symbol tables, as described in Section 16.3.
: checkltyped_termldl; e]] evaluates to true if the term e is type correct
: with respect to the declarations in dl.
Symbols:
: constructors for typed terms
: declarationla;t] declares the atomic symbol a to have type t
: typelal is a primitive type named by the atomic symbol a
: Pinetioniall is the type of functions with argument types given
by the list tl, and value of type t
: termlal] is a primitive term named by the atomic symbol a
: multiaplezel] is a term with head function symbol e applied to the list of
arguments el
typed_term: 2;
declaration: 2;
type: 1;
function: 2;
term: 1;
multiap: 2;
: standard list constructors
cons: 2;
nil: 0;
: type manipulating operations
typeof: 2;
typelist: 2;
resulttype: 2;
argtypes: 2;
: primitive symbol table operations
entertable: 3;
—
120
3. Use for Syntactic Manipulations
emptytable: 0;
lookup: 2;
: special type-checking operations
check: 1;
buildtable: 1;
looklist: 2;
checkargs: 2;
typecheck: 2;
: standard logical symbols
equ: 2;
and: 2;
include truth_values;
: atomic symbols used to identify type, constant, and function symbols
include atomic_symbols.
For all d, dl, e, el, t, tl, 2, tl, tll, tl2, a, al, a2, st, b:
: To check a typed term, build a symbol table from the declarations,
: then check the term against the symbol table.
checkltyped_termldl,e]] = typechecklbuildtableldll; el;
: The symbol table is built by the natural iteration through the list of
: declarations.
buildtable[0] = emptytablel];
buildtable[(declaration[a;t] . di)] = entertablelbuildtableldl]; a; tJ;
: Final type checking goes recursively through the term. Whenever a
: function application is encountered, the type of the function is computed and
: checked for consistency with the types of the arguments.
typechecklst; 0] = true;
typechecklst; (e . el)] = andltypechecklst; e]; typechecklst; elll;
typechecklst; termla]] = true;
typechecklst; multiaple; el]] =
andlandltypechecklst; e];
typecheckIst; ell]:
checkargslargtypesltypeoflst; el]; typelistlst; elll];
typelistlst; 0] = 0;
13.3. Example: Type-Checking 121
typelistlst; (e . eD] = (typeoflst; e] . typelist[st; ell);
typeoflst; termlal] = lookuplst; al;
typeoflst; multiaple; ell] = resulttypeltypeoflst; ell:
resulttypelfunctionlt, tl]] = t;
argtypeslfunctionlt; tll] = tl;
checkargs[0; O] = true;
checkargs[0; (t . 1D] = false;
checkargsl(t . 1D; O] = false;
checkargs[(t1 . tl1); (t2 . 1120] = andleqult1; t2]; checkargs[tl1; t12]]:
: Assume that equations are given for entertable, lookup.
: The standard equality test on atomic symbols is extended to type
expressions by the natural recursion.
equltypelail; typela2]] = equlal; a2]:
equlfunction{t1; tl1]; function[t2; t12]] =
andleqult!; t2]; checkargs[tl1; tl2]];
equlfunctionlt; tl]; typelal] = false;
equltypelal; functionlt,; tl]] = false;
include equatom;
: and is the standard boolean function.
andltrue; true] = true;
andltrue; b] = b.
The equations above may easily be augmented to report conflicts and omis-
sions in the declarations. If type checking is to be followed by some sort of seman-
tic processing, such as interpreting or translating the term, it may be useful to
122 13. Use for Syntactic Manipulations
attach types (or other information from a symbol table) to the symbols in a term,
so that the semantic processing does not need the symbol table. This technique
was used in the implementation of the equation interpreter itself: each variable
symbol in a left-hand side term is annotated to indicate the allowable substitutions
for that variable, and each variable in a right-hand side term is annotated to show
its address on the corresponding left-hand side. These annotations provide pre-
cisely the information needed by the semantic portion of the interpreter. To illus-
trate the technique, we present equations annotating each symbol in a typed term
with its type.
: typenotes|st; e] annotates each symbol in the term e with the type assigned to it
: by the symbol table st. Type conflicts are marked for debugging purposes.
Symbols:
: constructors for types and terms used as in the previous program
type. I;
Sunction: 2;
term: 1;
multiap: 2;
‘extra constructors for annotated terms
: aterm[a,t] is a primitive term named by the atomic symbol a of type t
: typeconflictle] marks the term e as having a typeconflict in the application
of its head function symbol to inappropriate arguments
aterm: 2;
typeconflict: 1;
: standard list constructors
cons: 2;
nil: 0;
: type manipulating operations
typeof: 2;
typelist: 2;
resulttype: 2;
argtypes: 2;
: primitive symbol table operations
lookup: 2;
: special type-checking operations
13.3. Example: Type-Checking 123
looklist: 2;
checkargs: 2;
typenotes. 2;
: standard logical symbols
equ: 2;
and: 2;
if: 3;
include truth_values;
: atomic symbols are used to identify type, constant, and function symbols
include atomic_symbols.
For all d, dl, e, el, t, t1, t2, tl, tll, tl2, a, al, a2, st, b:
: Annotation goes recursively through the term. Whenever a function
: application is encountered, the type of the function is computed and checked
: for consistency with the types of the arguments.
typenoteslst; O] = O;
typenoteslst; (e . el] = (typenoteslst; e] . typenotes[st; ell);
typenotes|[st; termla]] = atermla; typeoflst; all;
typenotes[st; multiaple; ell] =
iflcheckargslargtypesltypeoflst; ell: fypelistlst ell];
multiapltypenotesIst; e]; typenotes!st; ell];
typeconflict[multiapltypenotesl[st; el; typenotes|st; ellll]:
: Assume that equations are given for entertable, lookup.
: Equations for the remaining operations are the same as in the previous program.
if (true, x, y) = x; if(false, x, y) = y.
ee
14. Modular Construction of Equational Definitions
The equational programming language, although its concepts are far from being
primitive operations on conventional computing machines, is not really a high-level
language. Rather, it is the assembly language for an unusual abstract machine.
The problem is that the set of equations constituting a program has no structure to
help in organizing and understanding it. In order to be suitable for solving any but
rather small problems, the equational programming language needs constructs that
allow decomposition of a solution into manageable components, or modules, with
semantically elegant combining operations. In particular, different sets of equa-
tions, presented separately, must be combinable other than by textual concatena-
tion, and the combining operation must protect the programmer from accidental
coincidence of symbols in different components. Furthermore, sets of equations
must be parameterized to allow variations on a single concept (e.g., recursion over
a tree structure) to be produced from a single definition, rather than being gen-
erated individually. In this chapter, we define a speculative set of combining
operations for equational programs, and discuss possible means of implementation.
None of these features has been implemented for the equational programming
language, although some similarly motivated features are implemented in OBJ
[BG77, Go84]. Libraries of predefined equations cannot be integrated well into the
equation interpreter until some good structuring constructs are implemented.
In order to combine sets of equations in coherent ways, we need to design the
combining operations in terms of the meanings of equational programs, rather than
their texts. Based on the scenario of Section 1, the meaning of an equational pro-
gram should be characterized by three things:
14. Modular Equational Definitions 125
1. the language of terms over which equations are given;
2, aclass of models of that language;
3. the subset of terms that are considered simple, or transparent, enough to be
allowed as output.
Items 1 and 2 are standard concepts from universal algebra. For equational pro-
gramming, the information in 2 may be given equivalently as a congruence relation
on terms, but extensions to other logical languages might need the greater general-
ity of classes of models. In any case, the use of classes of models is better in keep-
ing with the intuitive spirit of our computing scenario.
Definition 14.1
Let = be a ranked alphabet, p(a) the rank of a€Z, Dy the set of terms over 2.
A model of 2 is a pair <U,y>, where U, the universe, is any set, and y is a map-
ping from © to functions over U with y(a):U%™ —U.
y extends naturally to Zy by Y(/(E,,---,£,)) = W)) (WE), ++ > W(E,)).
A model <U,y> satisfies an equation E=F, written <U,y> |- E=F, if
W(E)=)(F).
A set W of models satisfies a set E of equations, written W|=E, if
<U,y> |= E=F for all <U,y>€W, E=F EE.
The set of models of a set E of equations, written Mod(E), is
{<U,y>| <U,y> |+ E}.
The set of normal forms of a set E of equations, written Norm(E), is
{E€X,| VF=GEE F is not a subterm of E).
A computational world is a triple <2,W,N>, where & is a ranked alphabet, W is
a class of models of Z, and NCZy. N is called the output set.
The world of a set of equations E over %, written World(E), is
126 14. Modular Equational Definitions
<2, Mod(E), Norm(E)>.
a
The function World assigns meanings to equational programs by letting 2 be the
ranked alphabet defined by the Symbols section, and taking World(E) where E is
the set of all instances of equations in the program. The computing scenario of
Section 1 may be formalized by requiring, on input E, an output F € Norm (E)
such that Mod(E) |= E=F. In principle, we would like separate control of the
models and the output terms, but the pragmatic restriction that output terms must
be precisely the normal forms -- terms with no instances of left-hand sides of
equations -- limits us to constructs that generate both of these components of pro-
gram meanings in ways that make them compatible with the evaluation mechanism
of reduction. Notice that uniqueness of normal forms means that there may not be
two different terms F,,F,€Zy with W |= F\=F,, i.e, there must exist at least one
model in W in which ¥(F\)¥(F,).
Explicit presentation of an equational program is one way to specify a compu-
tational world, the purpose of this section is to explore others. Because this aspect
of our study of equational logic programming is quite new and primitive, we do not
try to build a convenient notation for users yet. Rather, we explore constructs that
are both meaningful and implementable, and leave the design of good syntax for
them to the future. The OBJ project has gone much farther in the user-level
design of constructs for structured definition of equational programs, and the con-
structs proposed here are inspired by that work [BG77, Go84]. Our problem is to
determine how well such constructs can be implemented in a form consistent with
our computational techniques. In particular, we prefer implementations in which
all of the work associated with the structuring constructs is done during the prepro-
14, Modular Equational Definitions 127
cessing of equations, so that the resulting interpreter is of the same sort that we
derive directly from a set of equations.
The most obvious way of constructing a computational world is to combine the
information in two others.
Definition 14.2
Given two ranked alphabets 2,C2,, and a model <U,y> of 2), we may restrict
this model naturally to a model of 2, by restricting y to Z,. When two classes of
models, W, and W2, are given over different alphabets, 2, and 2., the intersection
W,NW, is the set of all models of Z;UZ, whose restrictions to 2, and Z» are in
W, and W2, respectively. In the case where 2;=Zp, this specializes naturally to the
conventional set-theoretic intersection.
Given two computational worlds, <2, ,W, ,N,;> and <Z,, W2, N2>, the sum
<2), Wi, Ny >+<2,, W2, N2> is <2, UZ, W)NW2, Ni; UN2>.
0
The sum described above is similar to the enrich operation of [BG77], except that
enrich requires one of its arguments to be given by explicit equations. The sum of
computational worlds corresponds roughly to concatenation of two sets of equa-
tions. Even this simple combining form cannot be implemented by simple-minded
textual concatenation, because a variable in one program may be a defined symbol
in the other. If an equational program is syntactically cooked into a form where
each instance of a variable is marked unambiguously as such, then concatenation of
cooked forms will often accomplish the sum of worlds. The equation interpreter is
implemented in such a way that the cooked form is created explicitly, and may pro-
vide the basis for a very simple implementation of the sum. For greater efficiency,
we would like to implement the sum at the level of the pattern-matching tables
128 14. Modular Equational Definitions
that drive the interpreter. Such an implementation depends critically on the choice
of pattern matching technique (see Section 18.2).
The concatenation of syntactically cooked programs described above succeeds
only when the combined equations satisfy the restrictions of Section 5. Certainly,
this need not be the case, since the sum of computational worlds with unique nor-
sadt forms may not have unique normal forms. For example, combining the single-
ton sets {a=b) and (a=c} fails to preserve uniqueness of normal forms. This prob-
lem is inherent in the semantics of the sum, and can only be solved completely by
an implementation that deals with indeterminate programs. Unfortunately, there
are cases where the semantics of the sum does maintain uniqueness of normal
forms, but the concatenation of equations fails nonetheless.
Example 14.1
Consider the following two sets of equations:
1 (f(g(x))=a , a=b}
2. {g(c)=c , b=a}
Each of these sets individually has unique normal forms, and every reduction
sequence terminates. Their sum has uniqueness of normal forms, but not finite ter-
mination. Notice that, in the sum, the unique normal form of f (g(c)) is f (c), but
there is also an infinite reduction f(g(c))=a=b=a---. The combined set of
equations cannot be processed by the equation interpreter, because of the overlap
between f (g(x)) and g(c).
0
This failure to achieve an implementation of the sum in all semantically natural
cases is a weakness of the current state of the equation interpreter. Of course, one
14. Modular Equational Definitions 129
may fiddle with the semantics of the sum to get any implementation to work, but
we would rather seek implementation techniques that approach closer to the
semantic ideal.
The sum allows for combining equational programs, but by itself is not very
useful. We need a way of hiding certain symbols in a program, so that sums do
not produce surprising results due to the accidental use of the same symbol in both
summands. Other changes in notation will be required so that a module is not res-
tricted to interact only with other modules using the same notational conventions.
Both of these needs are satisfied by a single semantic construct.
Definition 14.3
Given a model <U,,y,> over the alphabet 2), a model <U,,y> over the alpha-
bet 2», and a relation 6C2,4%2Z24z, <U),y)>5<U,y.> if
VE\,F\€2\4 £2,F2€ 224 ESE, & Fi6F, & p(E)) =, (F)) > y,(E,)=y,(F,)
For a set W of models over Z,, 6[W] is the set of all models over 2, that are in the
6 relation to some member of W.
For a subset NG, y, [NV] is the set of all terms in 24 that are not in the 6 rela-
tion to any term not in NV.
Given a computational world <2Z,,W,N>, an alphabet 2, and a relation
5G 4XZ 4, the syntactic transform of <2,,W,N>, 5, 2] is <2, 6[W], SIN I>.
0
The syntactic transform defined above may accomplish hiding of symbols, by let-
ting ZC), with 6 the equality relation restricted to Z,. A change of notation, for
example letting f (x,y,z) represent g(x,h(y,z)), is accomplished by a 6 relating
each term E to the result E' of replacing every subterm of the form
g(F,,h(F2,F3)) with the form f(F,,F2,F;). The syntactic transform is similar to
130 14. Modular Equational Definitions
the derive operator of [BG77].
The syntactic transform for symbol hiding may be implemented by renaming
the symbols in 2, — Z in such a way that they can never coincide with names
produced elsewhere. Other syntactic transforms require different implementations
depending on the characteristics of the 6 relation. Suppose 6 is a partial function,
defined by a set of equations suitable for the equation interpreter. Add to the
equations defining 6 the equation
6(v) = v where v is a variable symbol.
The resulting equational program may be used to transform equations for
<2,,W,N> into equations defining ol<2,,W,N>, 6, 22]. If the transformation
eliminates all instances of symbols not in 24, and produces equations satisfying the
restrictions of Section 5, then we have an implementation of the syntactically
transformed world. We have not yet determined how easy or hard it is to produce
definitions of useful 5s allowing for this sort of transformation.
If 5 is a one—to—one correspondence between some subset of 2,4 and a sub-
set of Z¥, and if § and &' can be programmed nicely, then the syntactic
transform may be implemented by applying 5! to the input, applying an existing
program for <Z,,W,N> to that result, and finally applying 6. If 6 is defined by
an equational program, and if the result of reversing each equation satisfies the res-
trictions of Section 5, then we may use those reversals as a program for &!. Alter-
natively, given equational programs for 5 and 5! we may try to verify that they
are inverses by applying the program for 6! to the right-hand sides of equations in
the program for 5, to see if we can derive the corresponding left-hand sides.
Further study is required to determine whether either of these techniques works
14. Modular Equational Definitions 131
often enough to be useful. In any case, the last two techniques produce something
more complex than a single equational program, raising the question of how to con-
tinue combining their results by more applications of sum and syntactic transform.
15. High-Level Programming Techniques
This section treats high-level programming concepts that fit naturally on the
evaluation mechanism of the equation interpreter. The current version of the inter-
preter does not provide any special syntax to support these techniques, but future
work may lead to convenient syntaxes for applying them.
15.1. Concurrency
Although the current implementation of the equation interpreter runs on conven-
tional sequential computing machines, some qualities of equational logic show
interesting potential for implementation on parallel machines of the future. Even
with a sequential implementation, the ability to express programming concepts
based on concurrent computations gives a conceptual advantage. There are three
distinguishable sources of concurrent programming power in a reduction-based
evaluation for equational logic, two of which are provided by the current implemen-
tation.
The simplest source of concurrent programming power arises from the possi-
bility of evaluating independent subexpressions concurrently. Roughly speaking,
this means that several, or all, of the arguments to a function may be evaluated
simultaneously (in general, evaluation of a single argument may be partial instead
of complete). This potential concurrency arises simply from the absence of side-
effects in equational evaluation, and is the same as the potential concurrency in
Functional Programming languages [Ba78]. A genuinely parallel implementation
of the equational interpreter might realize a substantial advantage from such con-
currency. There is also a definite conceptual advantage to the programmer in
knowing that order of evaluation of arguments is irrelevant to the final result. In
15.1. Concurrency 133
most cases, however, no style of programming is supported that could not translate
quite simply into sequential evaluation in some particular order.
Concurrent evaluation of independent subexpressions becomes an essential
feature when it is possible to reach a normal form without completing all of the
evaluations of subexpressions, and when there is no way a priori to determine how
much evaluation of which subexpressions is required. The most natural example of
this behavior is in the parallel or, defined by the equations:
or(true, x) = true;
or(x, true) = true;
or(false, false) = false.
Intuitively, it seems essential to evaluate the arguments of the or concurrently,
since one of them might evaluate to true while the other evaluation fails to ter-
minate. In the presence of other equations defining other computable functions,
there is no way to predict which of the arguments will evaluate to a truth value.
Translation of the intuitively concurrent evaluation of an expression containing ors
into a sequential computation requires a rather elaborate mechanism for explicitly
interleaving evaluation steps on the two arguments of an or. The current imple-
mentation of the equation interpreter does not support the essential concurrency
involved in the parallel or, due to the restriction number 5 of Section 5 (left-
sequentiality). Section 19 presents evidence that the parallel or cannot be satisfac-
torily simulated by the sequentializable definitions allowed by the current imple-
mentation. Future versions will support the parallel or, and similar definitions.
The basis of the implementation is a multitasking simulation of concurrency on a
sequential machine. We do not know yet how small the overhead of the multitask-
ing can be, and that stage of the project awaits a careful study of the concrete
134 15. High-Level Techniques
details involved. The general problems of supporting this sort of concurrency on
conventional sequential machines are discussed in Section 18.3.
The final source of concurrent behavior is easily overlooked, but is arguably
the most useful of them all. This sort of concurrency results from the outermost,
or "lazy", evaluation strategy, that allows nested subexpressions to be reduced con-
currently. Roughly, this means that evaluation of a function value may go on con-
currently with evaluation of the arguments. Every piece of partial information
about the arguments may immediately result in partial, or even complete, evalua-
tion of a function value. As with the parallel or, this sort of concurrency translates
into sequential behavior only at the cost of a careful interleaving of steps from the
intuitively concurrent components of a computation. Unlike the unconstrained
interleaving of the parallel or, the interleaving of nested evaluation steps is highly
constrained by the need to generate enough information about arguments to allow
an evaluation step on a function of those arguments. Nested concurrency is sup-
ported by the current version of the interpreter, as it depends only on the outermost
evaluation strategy. It allows a style of programming based on pipelined corou-
tines, or more generally on dataflow graphs, as shown in Section 15.3.
15.2. Nondeterminism ys. Indeterminacy
Dijkstra argues eloquently [Di76] that programmers should not be required to
specify details that are inessential to solving a given problem. Not only is inessen-
tial detail wasteful of programming time, it may have an adverse effect on the clar-
ity of the program, by requiring a reader to separate the essential issues from the
inessential. On this consideration, Dijkstra proposes a nondeterministic program-
ming language, based on guarded commands. Not only are the computation steps
nondeterministic, the final answer of a guarded-command program may not be
15.2. Nondeterminism vs. Indeterminacy 135
uniquely determined by the inputs. This indeterminacy in the output is sometimes
desirable in addition to nondeterminism in the computation steps, since many prob-
lems admit several acceptable answers.
The equation interpreter supports some of the advantages of nondeterministic
specification of evaluation steps. Although the current implementation chooses,
rather arbitrarily, to evaluate from left to right, the semantics of the language
allow for any order of evaluation that finds a normal form. The presence or
absence of real nondeterminism in the implementation is never visible to a user in
the results of evaluation, since the restrictions of Section 5 guarantee uniqueness of
the normal form, independently of order of evaluation. This guarantee of unique-
ness is helpful in solving problems with uniquely determined answers, but it
prevents taking full advantage of the simplifications allowed by nondeterministic
and indeterminate programs for those problems with several acceptable answers. In
particular, it prevents a satisfying implementation of guarded commands [Di76] by
equations. The ideal facility would allow equational definitions with multiple nor-
mal forms, but recognize special cases where uniqueness is guaranteed.
There are no plans to introduce indeterminacy into the equation interpreter,
not because of anything fundamentally repugnant about indeterminacy, but
because the only simple way that we know to relax the uniqueness of normal forms
causes other problems in the semantics of equational programming. The unique-
ness of normal forms in equations satisfying the restrictions of Section 5 is a conse-
quence of the Church-Rosser, or confluence, property. This property says that,
whenever an expression A may reduce to two different expressions B and C, then
there is another expression D to which both B and C reduce. The confluence pro-
perty is required, not only to guarantee uniqueness of normal forms, but also to
136 15. High-Level Techniques
guarantee that some normal form will be found whenever such exists. The problem
arises when some expression A reduces to normal form, and also has an infinite
reduction sequence. The confluence property guarantees that, no matter how far
we pursue the infinite path, it is still possible to reduce to the normal form.
Without the confluence property, we may have such an A, and an expression B
appearing on its infinite reduction sequence, but B may not be reducible to normal
form. Yet, by the semantics of equational logic, B is equal to A, and therefore to
any normal form of A. Figure 15.2.1 shows this situation graphically.
/
/
/
Figure 15.2.1
So, in a system of equations without the confluence property, there may be an
expression B with a normal form, but the only way to find that normal form may
require backwards reductions. All of the helpful theory used to find an efficient
15.2. Nondeterminism vs. Indeterminacy 137
forward reduction to normal form fails to apply to backward reductions. It seems
quite unlikely that a weaker condition than confluence can be designed to allow
indeterminacy, but guarantee reductions to normal forms, so an indeterminate
equation interpreter probably requires a substantial new concept in evaluation tech-
niques. Even if such conditions are found, equational programming is probably the
wrong setting for indeterminacy, since it would be impossible to have, for example,
an expression a equal to each of the normal forms a and 4, and an expression 8
equal to each of the normal forms b and c, without also having a =c and B =a.
The computation techniques of reduction sequences could be applied to asymmetric
congruence-like relations in order to produce indeterminacy, but such relations are
not well established in logic, as is the symmetric congruence relation of equality.
15.3. Dataflow
The nested concurrency provided by outermost evaluation allows a very simple
translation of dataflow programs into equational programs. A dataflow program
[KM66] consists of a directed graph with an optional entry and an exit. The entry,
if any, represents an input sequence of values, the exit an output sequence. Each
edge in the graph represents a sequence of values communicated between two
processes, represented by the nodes. In the semantics of a dataflow graph, the
processes at the nodes are restricted to view their incoming sequences in order, and
to produce each outgoing sequence in order, although several proposals for imple-
menting them use numerical tags to simulate the order, and allow the actual time
sequence to be different. A common kind of node, called a copy node, contains a
process that merely copies its single incoming edge to each of its outgoing edges. It
is easier for our purposes to group a copy node and its edges into one multiedge
with a single head and several tails.
138 15. High-Level Techniques
Every dataflow graph in which the node processes are all determinate as func-
tions of their incoming sequences (i.c., they cannot make arbitrary or timing-
dependent steps) may be translated easily into an equational program running on
the equation interpreter. The basic idea is to use a convenient list notation, such as
LISP.M, and replace each edge in the dataflow graph by a constant symbol, whose
value will turn out to be the (possibly infinite) sequence of values transmitted
through that edge. Each node becomes a set of function symbols, one for each out-
going edge, with arity equal to the number of incoming edges. Equations are writ-
ten to define each node function. Finally, the connection structure of the dataflow
graph is realized by a single defining equation for each of the edge constants. That
is, if f is the function symbol representing a node with incoming edges a and b,
and outgoing edge c, include the equation c[] = f[a, 6]. In some cases structural
equations may be condensed with some of the equations defining node processes.
In particular, tree-shaped subgraphs of the dataflow graph may be condensed into
single expressions, using edge names only to break loops. A single example should
suffice to illustrate the technique.
Example 15.3.1
Consider the dataflow graph of Figure 15.3.1. The labels on edges and nodes indi-
cate the corresponding constant and function symbols. Assuming that equations
are given to define f, g, A, the following equations give the structure of the graph:
al] = flinputfl, bil;
bf = glafll;
output] = hla; bf].
These equations may be condensed by eliminating either a or b (but not both),
15.3. Dataflow 139
Figure 15.3.1
yielding
60 = glflinputQ; bf:
output] = hlflinputf]; bf; bf.
or
all = flinputf]; glaf]ll;
output] = hlall; glaff]].
0
In order to make use of the equational translation of a dataflow program, we also
require a definition of input, and some mechanism for selecting the desired part of
output, unless it is known to be finite (and even short). The idea of translating
dataflow graphs into equations comes from Lucid [WA85].
For a more substantial example of dataflow programming, consider the prime
number sieve of Eratosthenes. This elegant version of the prime number sieve is
based on a dataflow program by Mcllroy [Mc68, KM77]. The basic idea is to take
the infinite list (2 3 4 ...), and remove all multiples of primes, in order to produce
140 15. High-Level Techniques
the infinite list of primes (the same one used to produce the multiples of primes
that must be removed to produce itself). Figure 15.3.2 shows the dataflow graph
corresponding to this idea, with each edge and node labelled by its corresponding
symbol in the equational program.
primes
intlist [3]
Figure 15.3.2
The following equational program captures the concept of the dataflow graph
above, varying the notation a hit for clarity. Of course, in the equational program,
it is essential to evaluate, not the infinite lists themselves, but an expression produc-
ing some finite sublist.
Example 15.3.1
: The following definitions are intended to allow production of lists of
: prime numbers. The list of the first i prime numbers is firstnli; primes[]].
Symbols
: List construction and manipulation operators
cons: 2;
nil: 0;
first: 1;
tail: 1;
firstn: 2;
15.3. Dataflow 141
: Logical and arithmetic operators
if: 3;
add, subtract, multiply, modulo, equ, less: 2;
: Operators associated with the prime sieve
intlist: 1;
sieve: 2;
fact: 2;
primes: 0;
: Primitive domains
include integer_numerals, truth_values.
For all i, j, q, r:
: firstlq] is the first element in the list q.
: taillg] is the list of all but the first element in the list q.
firslG. Q)] =i; taillG. J] = 4;
: firstnli; q] is the list of the first i elements in the list q.
firstnli; q] = iflequli; 0]; O; Girstlq] . firstnlsubtractli; 1]; taillgl)J;
: if is the standard conditional function.
ifltrue; i; j] = i; iflfalse; i; j] = j;
include addint, multint, subint, modint, equint, lessint;
: intlistli] is the infinite list @ it1 i+2 ...).
intlistli] = (i . intlistladdfi; 1]);
: The definitions of sieve and fact assume that r is given in increasing order, that
contains all of the prime numbers, and that r contains nothing less than 2.
: sievelq; r] is the infinite list of those elements of the infinite list q that are
‘not multiples of anything on the infinite list r.
sievel@i . q); r] = iflfactli; rl; sievelq; r]; G . sievelg; rD)I;
: factli; r] is true iff the infinite list r contains a nontrivial factor of i.
factli; G . r)] = ifflessh; multiply[j; jl];
false;
iflequlmoduloli; jl; 01;
142 15. High-Level Techniques
true,
factli; r]]];
: primes[] is the infinite list of prime numbers, (2 3 57 11 13 17...
primesf] = (2. sievelintlist[3]; primesfD.
The correct behavior of this dataflow program for primes depends on the not-so-
obvious fact that there is always a prime between n and n. If not, the loop
through the sieve and cons nodes would deadlock.
While the outermost evaluation strategy guarantees that the internal computa-
tions of an equational program will satisfy the semantics of an associated dataflow
graph, there is still an interesting issue related to input to and output from the
interpreter. At the abstract level, input is any expression, and output is a
corresponding normal form. It is semantically legitimate to think of the input
expression being provided instantaneously at the beginning of the computation, and
the output expression being produced instantaneously at the end. An implementa-
tion following that idea strictly will not allow a dataflow behavior at the input and
output interfaces of the interpreter. The all-at-once style of input and output forms
a barrier to the composition of several equational programs, since it forces the, pos-
sibly very large, expressions transmitted between them to be produced totally,
before the receiving program may start. In the worst case, an infinite term may be
produced, even though only a finite part of it is required by the next step.
The current version of the equation interpreter incorporates a partial dataflow
interface on output, but none on input. The output pretty-printers do not support
incremental output, so a user may only benefit by using the internal form of out-
put, or writing his own incremental pretty-printer. Interestingly, incremental out-
put from the interpreter was easier to code than the all-at-once form. An output
15.3. Dataflow 143
process drives the whole evaluation, by traversing the input expression. When it
reaches a symbol that is stable -- that can clearly never change as a result of
further reduction -- it outputs that symbol, and breaks the evaluation problem into
several independent problems associated with the arguments to the stable symbol.
Section 17.2 gives a careful definition of stability, and Section 18.2 describes how it
is detected. When the output process finds an unstable symbol, it initiates evalua-
tion, which proceeds only until that particular symbol is stable. The control struc-
ture described above is precisely the right one for outermost evaluation, even if
incremental output is not desired.
In the computation described above, the interpreter program must contain an
a priori choice of traversal order for expressions -- Jeftmost being the natural
order given current typographical conventions. Leftmost production of an output
expression would support natural incremental output of lists, using the conventional
LISP notation. Since the essential data structure of the equation interpreter is
trees, not lists, and there is no reason to expect all applications to respect a left-
most discipline, a future version of the interpreter should support a more flexible
output interface. Probably the right way to achieve such an interface is to think of
an interactive question-answer dialogue between the consumer of output and the
interpreter. The precise design of the question-answer language requires further
study, but it is likely to be something like the following. The output consumer
possesses at least one, and probably several, cursors that point to particular subex-
pressions of the normal form expression coming from the interpreter. The output
consumer may issue commands of the form "move cursor x to a", where a may be
the father, or any specified son, of x’s current position. The interpreter does pre-
cisely enough evaluation to discover the new symbol under the cursor after each
144 15. High-Level Techniques
motion, and reports that symbol on its output. Thus, the order of traversal may be
determined dynamically as a result of the symbols seen so far.
The question-answer output interface described above may also be used for the
input interface, with the equation interpreter producing the questions, and some
input process providing the answers. Such an interface will certainly allow connec-
tion of several equational programs in a generalized sort of pipeline. Another sort
of input interface is likely to be more valuable, and also more difficult to construct.
In order for equational programs to be used as interactive back ends behind struc-
ture editors, a purpose for which the equational programming style appears to be
nicely suited as argued in Section 13, an incremental input interface must be pro-
vided in which the input process (essentially the user himself) determines the order
in which the input expression is produced. Worse, existing portions of the input
term may change. Changing input is substantially more difficult to accommodate
than incrementally produced input. Besides requiring an efficient mechanism for
incremental reevaluation, avoiding the reevaluation of unchanged portions of an
expression, some adjustment of the output interface is required to notify the output
consumer of changes as they occur. Apparently, the output consumer must be able
to specify a region of interest, and be notified of whatever happens in that region of
the expression. Further study is required to find a convenient representation of a
region of interest, taking into account that the topology of the expression is subject
to change as well as the symbol appearing at a particular node.
The incremental evaluation problem for equational programs appears to be
substantially more difficult than that for attribute grammars [RTD83]. The
elegance of Teitelbaum’s and Reps’ optimal reevaluation strategies for attribute
grammars gives that formalism a strong edge in competition for beyond-context-
15.3. Dataflow 145
free processing. Work is in progress to close that gap. The same factors that make
incremental evaluation more difficult for equational programs will also make a good
solution more valuable than that for attribute grammars. The optimal reevaluation
strategy for attribute grammars treats attributes and the functions that compute
one attribute from another as primitives. The reevaluation strategy only deter-
mines which attributes should be recomputed, it does not treat the problem of
incremental reevaluation of an individual attribute with significant structure. Yet,
one of the most critical performance issues for attribute reevaluation is avoidance
of multiple copies and excess reevaluation of components of highly structured attri-
butes, especially symbol tables. A lot of work on special cases of reevaluation of
large, structured attributes is underway, but we are not aware of any thoroughly
general approach. A good incremental evaluation strategy for equational programs
will inherently solve the total problem, since the expression model of data underly-
ing the equation interpreter makes all structure in data explicit, rather than allow-
ing large structures to be treated as primitives.
All of the preceding discussion of input and output ignores the possibility of
sharing of identical subexpressions in an expression. The equation interpreter could
not achieve an acceptable performance in many cases without such sharing.
Perhaps such sharing should be accommodated in the treatment of input and out-
put as well, but careful thought is required to do so without distasteful complexity.
15.4. Dynamic Programming
Dynamic programming may be viewed as a general technique for transforming an
inefficient recursive program into a more efficient one that stores some portion of
the graph of the recursively defined function in an array, in order to avoid recom-
putation of function values that are used repeatedly. In a typical application of
146 15. High-Level Techniques
dynamic programming, the user must completely specify the way in which the
graph of the function is to be arranged in an array, and the order in which the
graph is to be computed. The latter task may be handled automatically by the
equation interpreter. To illustrate this automation of part of the dynamic program-
ming task, we give equations for the optimal matrix multiplication problem of
[AHU74]. Instead of defining only a small finite part of the graph of the cost
function, we define the infinite graph, and the outermost evaluation strategy of the
equation interpreter guarantees that only the relevant part of the infinite graph is
actually computed.
Example 15.4.1
The following equations solve the optimal matrix multiplication problem from The
Design and Analysis of Computer Algorithms, by Aho, Hopcroft and Ullman, sec-
tion 2.8. Given a sequence of matrix dimensions, (dy d, -- - d,,,), the problem is to
find the least cost for multiplying out a sequence of matrices M,+*M *-:-> M,,
where M; is d;_,Xd;, assuming that multiplying an ixj matrix by a jXk matrix to
get an ixk matrix costs ixj*k. There is an obvious, but exponentially inefficient
recursive solution, expressed in a more liberal notation than that allowed by the
equation interpreter:
costl(dg- ++ d,,)] =
min{costl (do +++ dj) Heost (dja, +++ dp.) Hdoxd +d, |0<i<m)
cost((dod,)] =0
The problem with the recursive solution above is that certain values of the function
cost are calculated repeatedly in the recursion. Dynamic programming yields a
polynomial algorithm, by storing the values of cost in an array, and computing
each required value only once. In the following equations, the function cost is
15.4. Dynamic Programming 147
represented by an infinite-dimensional infinite list giving the graph of the function:
costgraph(O] =
(0 (costl(1)] (cost{A 1] (ost 1 1].. .
(cost[( 1 2)]...
.d
(cost{Q 2] (costI(l 2 1]... )
Censiltt 2 2)]...)
me,
(cost{(2)] (cost[(2 1)] couse 11J...)
wad 7
That is, cost{ (dg --- d,,)] is the first element of the list which is element d,,+1 of
element d,, — ;+1 of ... element do+1 of costgraph{0]. cost[(i)] is always 0, but
explicit inclusion of these Os simplifies the structure of costgraph. costgraph([al,
for a*¥(), is the fragment of costgraph[Q] whose indexes are all prefixed by a.
Symbols
: operators directly related to the computation of cost
cost: 1;
costgraph: 1;
costrow: 2;
reccost: I;
subcosts: 2;
: list-manipulation, logical, and arithmetic operators
cons: 2;
nil: 0;
min: 1;
index: 2;
length: 1;
element: 2;
148 15. High-Level Techniques
equ: 2;
less: 2;
subtract: 2;
multiply: 2;
include integer_numerals, truth_values.
For all a, b, i, j, k, x, y:
costla] = indexl[a; costgraphIOI];
: costgraphla] is the infinite graph of the cost function for arguments starting
: with the prefix a.
costgraphla] = (reccostla] . costrowla; 1]);
: costrowla; i] is the infinite list
: (costgraphlai] costgraphlai+1] ... )
: where ai is a with i added on at the end.
costrowla; i] =
(costgraphladdendla; i]] . costrowla; addli; 111);
: reccostla] has the same value as costla], but is defined by the recursive equations
: from the header.
reccostl(i j)] = 0; reccostI()] = 0; reccost[O] = 0;
reccostl(i j . a)] = minlsubcostsl(i j . a); lengthlall]
where a is (k . b) end where;
: subcostsla; i] is a finite list of the recursively computed costs of (dO ... dm),
: fixing the last index removed at i, i-I, ... 1.
subcostsla; i] =
iflequli; 0]; 0;
(addladdlcostffirstnladdli; 1]; all; costlafternladdli; 1]; aJJ];
multiplylmultiplylfirstla]; elementladdli; 1]; all; lastlal]DI;
: Definitions of list-manipulation operators, logical and arithmetical operators.
minl@)] = i;
minl(i. a)] = ifflessli; minlall; i; minlal]
where a is (k . b) end where;
15.4. Dynamic Programming 149
indexlO; (x . bJ] = x;
indexl(i . a); x] = indexla; elementladdli; 1]; xIl;
length[O] = 0;
lengthl(x . a)] = addflengthlal; 1];
elementli; (c . a)] = iflequli; 1]; x; elementlsubtractli; 1]; all;
firstnli; a] = iflequli; 0]; O; Girstlal . firstn[subtractli; 1]; taillal)];
firstI(x . ad] = x; taille. @] = a;
afternli; a] = iflequli; OJ; a; afternlsubtractli; 1]; taillall];
lastiG)] = x;
lastl(x y . aj] = lastly . a];
addend[0; y] = (y);
addend[(x . a); y] = (x . addendla; y));
ifltrue; x;y] =x; iflfalse; x; y] = y;
include addint, equint, subint, multint.
0
Although the algorithm in [AHU74] runs in time O(n?) on a problem of size
n, the equation interpreter, running the equations above, takes time O(n‘) because
of the linear search required to look up elements of the graph of the cost
function. —_ By structuring the graph as an appropriate search tree, the time could
be reduced to O(n logn), but there is no apparent way to achieve the cubic time
bound, since the equation interpreter has no provision for constant-time array
indexing. Even the search-tree implementation requires some careful thought in
the presence of infinite structures. Section 16.3 shows how search tree operations
may be defined by equations.
150 15. High-Level Techniques
Avoidance of recomputation of function values in the example above depends
on the sharing strategy of the equation interpreter, described in Section 18.4. A
future version of the interpreter will provide, as an option, an alternate implemen-
tation of evaluation based on the congruence closure algorithm. [NO80, Ch80]. In
that implementation, all recomputation will be avoided in every context, and the
naive recursive equations can be used without the exponential cost.
16. Implementing Efficient Data Structures In Equational Pro-
grams
The usefulness of the equation interpreter in its current form is severely limited by
the lack of carefully-designed data structures. Instead of providing many choices
of data structures, the equation interpreter provides the raw materials from which
different data structures may be built. In order to make the system usable for any
but small problems, a library of predefined data structures must be built. The
modular constructs from Section 14 will then be used to incorporate appropriate
definitions of data structures into a program as they are needed. This section
demonstrates the basic techniques used to define some popular efficient data struc-
tures.
16.1 Lists
Implementation of LISP-style lists in the equation interpreter is so straightforward
that it is contained in several examples earlier in the text. We define the symbols
car, cdr, cons, nil, equ, atom, and null, according to LISP usage, except that nil
is not taken to be an atomic symbol.
Example 16.1.1
Symbols
: List constructors.
cons: 2;
nil: 0;
include atomic_symbols;
: Selectors for the head, tail of a list.
car, cdr: 1;
: Predicates on lists.
equ: 2;
atom: I;
null: 1;
152 16. Implementing Data Structures
: Boolean symbols.
and: 2;
include truth_values.
For all x, x1, x2, 1, II, 12:
carl(x . DJ] = x;
cdrl(x . DJ = 1;
equlQ; O] = true;
equl(x1 . 11); (<2. 12)] = andlequlx1; x2]; equll1; 12I];
equl(x! . 11); x2] = false
where x2 is either () or in atomic_symbols end or end where;
equix1; x2] = false
where x1 is in atomic_symbols, x2 is either Q or (x . D end or end where;
equlQ; x2] = false
where x2 is either (x. D) or in atomic_symbols end or end where;
include equatom;
atom[x] = true
where x is in atomic_symbols end where;
atoml0] = false;
atoml(x . D] = false;
nulllO] = true;
nulll(c . D] = false;
nulllx] = false
where x is in atomic_symbols end where;
andl[true; true] = true;
andltrue; false] = false;
andffalse; true] = false;
andffalse; false] = false.
16.1. Lists 153
0
The implementation of lists described above is sufficient for all programming
in the style of pure LISP. It is not necessarily the implementation of choice for all
list applications. The following equational program defines nonempty lists using
cat (concatenate two lists) and /ist (create a list of one element) as constructors
instead of cons. The resulting notation for lists no longer shows the LISP preju-
dice toward processing from first to last, so, instead of car and cdr, the four selec-
tors first, last, head (all but the last), and tail (all but the first) are given. Since
empty lists are not represented, it is appropriate to have the test singleton instead
of null.
Example 16.1.2
Symbols
: List constructors.
cat: 2;
list: 1;
include atomic_symbols;
: List selectors.
first: 1;
last: 1;
head: 1;
tail: 1;
: Predicates on lists.
equ: 2;
singleton: 1.
For all x, 11, 12, 13:
first(listG)) = x;
first(cat (11, 12)) = first(11);
last(list()) = x;
last(cat(11, 12)) = last(12);
154 16. Implementing Data Structures
head(cat(11, cat(12, 13))) = cat(11, head(cat(12, 13)));
head(cat (11, list(x))) = I;
tail(cat(cat(11, 12), 13)) = cat(tail(cat(11, 12)), 13);
tail (cat (list (x), 13) = 13;
equ(list(x1), list(<2)) = equ(xI, x2);
equ(!1, 12) = and(equ(first (11), first(12)), equ(tail(11), tail(12)))
where 11, 12 are cat(13, 14) end where;
equ(list(x), cat(I1, 12)) = false;
equ(cat(I1, 12), list(x)) = false;
include equatom;
singleton (list(x)) = true;
singleton(cat(11, 12)) = false;
: Use the same definition of and as in Example 16.1.1.
0
The list-cat representation of lists differs from the LISP version in making con-
catenation just as cheap as adding to the head, at the expense of an increase in the
cost of producing the first element of a list. Perhaps more significant is the effect
on infinite lists. A LISP list may only be infinite to the right, lists constructed
from cat may have infinite heads as well as infinite tails, and even infinite inter-
mediate segments.
The list-cat style of lists is symmetric with respect to treatment of the first
and last elements, but it still makes production of intermediate elements more
clumsy than firsts and lasts. A rather cute application of error markers, as
16.1. Lists 155
described in Section 12.3, minimizes the clumsiness. In the following equations,
select(n,l) selects, if possible, the nth element from the list /. short (i) reports
that a list was short by i elements for the purpose of producing a specified element.
Example 16.1.3
select(n, list(x)) = if(equ(n, 1), x, short(subtract(n, 1)));
select(n, cat(11, 12)) =
if(tooshort (select (n, 11)),
select (subtract(n, shortby(select(n, 11))), 12),
select(n, 11));
tooshort(short(n)) = true;
tooshort(x) = false
where x is in atomic_symbols end where;
shortby(short(n)) = n;
if(true, x, y) = x; if(false, x, y) = y;
include subint.
0
In order to give similar treatment to all elements of a list, an extension to the
cat notation is required. listlnth(n,1) marks the list / as having length n. Asa
special case, listInth (1,x) represents the singleton list of x, replacing /ist (x) in the
earlier examples. This variation implies a slightly different semantic view of lists,
where a singleton list is the same thing as its sole element. Regrettably, an addi-
tional symbol icat (inactive concatenation) is required for concatenations that
have already been associated with their lengths, in order to avoid an infinite com-
putation for a concatenation. The following equations may be used to redefine
156 16. Implementing Data Structures
select in the new notation.
Example 16.1.4
cat(listinth(n1, 11), listInth(n2, 12)) =
listInth(add(nl, n2), icat(listInth(n1, 11), listInth(n2, 12)));
select(n, listinth(nI, J) = if(less(n1, nd,
short (subtract(n, n1)),
select(n, l1));
select(1, x) = x where x is in atomic_symbols;
select(n, icat(listinth(n1, 11), listInth(n2, 12))) =
if(less(n1, n),
select(subtract(n, n1), listInth(n2, 12)),
select (n, 11)).
0
In addition to its clumsiness, the /ist/nth notation suffers from an inability to deal
with any sort of infinite list.
Even using one of the variations of cat notation, provision should probably be
made for the empty list. The easiest way to add the empty list is to use the nullary
symbol empty, and allow any number of emprys to appear in a list, ignored by all
operations on that list. It is easy to modify the definitions of first, last, head, tail,
etc. to ignore emptys. The more efficient, but syntactically clumsier, solution is to
eliminate empty wherever it appears in a concatenation. Unfortunately, the obvi-
ous solution of adding the equations cat (empty 0, 1) =/ and cat (I, empty) =1
will not work because of the restrictions on left-hand sides of equations. First,
these two equations have a common instance in cat (empty, empty). That problem
is easily avoided (at the cost of extra evaluation steps) by changing the second
equation to
16.1. Lists 157
cat(l, emptyO) = 1
where | is either list(x) or cat(11, 12) end or end where;
There is still a problem with overlap of the new equations with every other equa-
tion describing a function recursively by its effect on a concatenation, for instance
the second equation in Example 16.1.2, first (cat (J1, 12)) = first (11). In order to
avoid those overlaps, we must introduce the inactive concatenation symbol, icat,
using it in place of cat as the list constructor, and adding the equations
cat(emptyO, D = 1;
cat(I, emptyO) = 1
where | is either list(x) or icat(I1, 12) end or end where;
cat(I1, 12) = icat(11, 12)
where 11, 12 is either list(x) or icat(I1, 12) end or end where.
This is another example of the last technique for removing overlaps described in
Section 12.4
While the variations on list representation described above avoid a certain
amount of unnecessary searching, and appear to have a heuristic value, their worst
case operations involve complete linear searches. To improve on the worst case
time for list operations, we must balance the tree representations of lists. Such bal-
anced list representations may be derived by taking the search tree representations
of Section 16.3, and omitting the keys.
16.2. Arrays
There is no way to implement arrays with constant access time in the equation
interpreter. Such arrays could be provided as predefined objects with predefined
operations, in the style of the arithmetic operations, but only at the cost of substan-
tial increase to the conceptual complexity of storage management. Instead, we pro-
158 16. Implementing Data Structures
pose to implement arrays as balanced trees, accepting a logarithmic rather than
constant access time. The following definitions implement one-dimensional arrays
ranging over arbitrary subranges of the integers. Three constructors are used:
array (i, j, a) denotes an array with indexes ranging from i to j, with contents
described by a. In describing the contents, arrbranch (a1, a2) denotes an array, or
subarray, in which locations indexed by integers ending with binary bit 0 are given
in al, and those indexed by integers ending with binary bit 1 are given in a2.
arrelement (x) denotes the single array element with value x. element (i, a) pro-
duces the element indexed by i in the array a, constarry(i, j, x) produces an
array, indexed from i to j, containing (j — i)+1 copies of x. update(a, i, x) is
the array a with the element indexed by i changed to the value x.
Example 16.2.1
Symbols
: Array constructors.
array: 3;
arrbranch: 2;
arrelement: 1;
: Array selector.
element: 2;
: Array initializer.
constarray: 3;
: Array modifier.
update: 3;
: Functions used internally for array computations.
const: 2;
: Arithmetic operators for index computations.
equ: 2;
subtract, divide, modulo: 2;
ve integer_numerals, truth_values;
if: 3.
For all i, j, k, a, al, a2, x, y:
16.2. Arrays
element(i, array(j, k, a)) = element (subtract(i, j), a);
element(i, arrbranch(al, a2)) =
if(equ(modulo(i, 2), 0),
element (divide(i, 2), al),
element (divide(i, 2), a2));
element (0, arrelement(x)) = x;
constarray(i, j, x) = array(i, j, const(subtract(j, i), x));
const (i, x) = iflequii, 1),
x,
arrbranch(const (divide(i, 2), x),
const (subtract (i, divide(i, 2)), x)));
update(array(i, j, a), k, x) = update(a, subtract(k, i), x);
update(arrbranch(al, a2), i, x) =
if(equ(modulo(i, 2), 0),
arrbranch(update(al, divide(i, 2), x), a2),
arrbranch(al, update(a2, divide(i, 2), x)));
update(arrelement (y), 0, x) = arrelement (x);
include equint, modint, divint, subint.
0
159
In principle, the constructor arrelement is superfluous, and array elements could
appear directly as arguments to arrbranch.
In that case, the equation
element (0, arrelement(x)) = x would be replaced by element (0, x) =x, with a
where clause restricting x to whatever syntactic forms were allowed for array ele-
ments. The version given above has the advantage of allowing any sorts of ele-
ments for arrays, including other arrays. Multidimensional arrays are easily
represented by arrays whose elements are arrays of dimension one less.
Notice that the outermost evaluation strategy of the equation interpreter
160 16. Implementing Data Structures
causes constarray and update expressions to be worked out only as far as required
to accommodate element operations. So, some condensation of sparsely accessed
arrays is provided automatically. It is very tempting to allow the element opera-
tion to take a greater advantage of sparseness, using the equation
element (i, const (j,x)) =x. Unfortunately, this equation overlaps with the recur-
sive definition of const, violating the syntactic restrictions of Section 5. In this
case, the violation is clearly benign, and future versions of the interpreter should
relax the restrictions to allow overlaps of this sort. Further research is required to
find an appropriate formal description of a useful class of benign overlaps. Such a
formalism will probably be based on the Knuth-Bendix closure algorithm [KB70],
extended to deal with possibly nonterminating reductions.
The explicit use of index bounds in array (i, j, a) prohibits infinite arrays, but
infinite arrays may be implemented with the arrbranch constructor, assuming that
the range always starts at 0. Arrays infinite on the left as well as the right are
easily provided by piecing together two arrbranch structures, and using an initial
comparison with 0 to direct operations to the appropriate portions.
In the array implementation described above, indexes are grouped in terms
according to their agreement on low—order bits. E.g., the even indexes all go to
the left of the first branch point, and the odd indexes go to the right. For applica-
tions involving only element and update operations, the order of indexes is
irrelevant, and this one was chosen for arithmetic simplicity. If other primitives,
operating on numerically contiguous sets of indexes, are desired, the definitions
may be modified to treat bits of an index from most significant to least significant,
at the price of slightly more complex arithmetic expressions in the definitions. This
rearrangement of indexes will also affect the benefits of sparse access by changing
16.2. Arrays 161
the structure of contiguous elements in the data structure.
16.3. Search Trees and Tables.
The arrays of Section 16.2 avoid explicit mention of index values in the data struc-
ture by assuming that those values are chosen from a contiguous range. Sparseness
in the use of the indexes may allow savings in space, due to the outermost evalua-
tion strategy leaving some instances of const unevaluated, but the time cost of each
element and update operation will be proportional to the logarithm of the total
index range, no matter how sparsely that range is used. When the range of legal
indexes is so great that even this logarithmic cost is not acceptable, balanced
search trees should be used instead of arrays. At the cost of storing index values
(usually calied keys in this context) as well as element values, we can let the cost
of access be proportional to the logarithm of the number of keys actually used,
rather than the total range of keys. This section shows three alternative definitions
of balanced search trees. The first two were developed by Christoph Hoffmann,
and the description is adapted from [HO82b].
One popular balanced tree scheme that can be implemented by equations is
based on 2-3 trees, a special case of B—trees. Informally, a 2-3 tree is a data
structure with the following properties. In a 2-3 tree, there are 2—nodes, with two
sons and 3—nodes, with three sons. A 2-node is labelled by a single key a, so that
the keys labelling nodes in the left subtree are all smaller than a, and the keys
labelling nodes in the right subtree are larger than a. A 3-node is labelled by two
keys a and b, with a<b, so that the keys in the left subtree are all smaller than a,
the keys in the middle subtree are between a and b, and the keys in the right sub-
tree are larger than b. A leaf is a node whose subtrees are all empty. A 2-3 tree
is perfectly balanced, i.e., the path lengths from the root to the leaves of the tree
162 16. Implementing Data Structures
are all equal. Figure 16.3.1 below shows an example of a 2-3 tree. In this section,
we will always show search trees as collections of keys only, with a membership
test. It is straightforward to add arbitrary information associated with each key,
and augment the membership test to produce that information. The augmentation
is easy, but obscures the more interesting issues having to do with balancing the
distribution of the keys, so it is omitted.
Figure 16.3.1
When inserting a new key k into a 2-3 tree, there are two cases to consider.
If the proper place for inserting k is a 2-node leaf, then we simply convert the leaf
to a 3-node. If the insertion should be made into a 3-node leaf, then we must
somehow restructure parts of the tree to make space for k. The restructuring
proceeds as follows. First form a 4-node, that is, a node with three keys a, b, and
c, and four subtrees, as shown in Figure 16.3.2(a). Of course, if we begin with a
leaf, then the subtrees are all empty. Now split the 4-node into three 2-nodes as
shown in Figure 16.3.2(b). The key of the middle 2-node must be inserted into the
node that is the former father of the 4-node, since, through the splitting, we have
16.3. Search Trees and Tables 163
alot.
Figure 16.3.2
increased the number of sons. If the father node is a 2-node, then it becomes a 3-
node, otherwise the splitting process is repeated on the father’s level. If there is no
father, i.e., if we have just split the tree root, then no further work is required.
Note that without the insertion of the middle node into the father we would des-
troy the balance of the tree.
We denote a 2-3 tree by an expression tree(t), where ¢ represents the labelling
and structure of the tree. A nonempty subtree with a 2-node root is written
t2(x1, 1/1, x2), where x1 and x2 represent its left and right subtrees, and /1 is the
label of the node. Similarly, 3(x1, /1, x2, 1/2, x3) denotes a 3-node with labels 11
and 12 and subtrees x1, x2, and x3. The constant e denotes the empty subtree.
Example 16.3.1
The 2-3 tree of Figure 16.3.1 is represented by
164 16. Implementing Data Structures
tree(t2(t2(t2(e(0), 1, eQ),
13000, 3, 0, 4, e0)),
7311300, 11, eO, 12, e0),
C0, 15, eQ),
300, 20, eQ))))
0
Now we program the insertion of a key k into a 2-3 tree tree (x). Insertion of
a key k proceeds by first locating the leaf in which to insert k. For this purpose,
we must compare k to the labels of a node. If the comparison detects equality,
then k is already in the tree and the insertion is done. Otherwise, the result of the
comparison determines a subtree into which k is inserted.
Example 16.3.2
Symbols
: Constructors for search trees.
: tree, t2, and t3 are active symbols, as well as constructors.
tree: 1;
e: 0;
include atomic_symbols, integer_numerals;
: Operations on search trees.
insert: 2;
member: 2;
: Symbols used in the definition of insert.
put: 3;
: Arithmetic and logical operators.
less: 2;
equ: 2;
if: 3.
For all k, 1, 1, 12, x, x1, x2, x3, y:
16.3. Search Trees and Tables 165
: insert (k, x) inserts the key k in the tree x.
insert(k, tree(x)) = tree(insert(k, x));
insert(k, t2(x, lL, y)) =
if(equ(k, D, t2Cx, L, y),
if(less(k, D), t2(insert(k, x), L, y),
12(x, I, insert(k,y))));
insert(k, t3(c1, Ll, x2, 12, x3) =
if(or(equ(k, 11), equtk, 12)), t3(x1, 11, x2, 12, x3),
if(ess(k, 11), t3(insert(k, x1), I, x2, 12, x3),
if(less(k, 12), t3Gc1, U1, insert(k, x2), 12, x3),
13(x1, I, x2, 12, insert(k, x3)))));
insert(k, eO) = put(eQ, k, eQ);
12(put (x, k, y), I, x2) = 13(x, k, y, Hl, x2);
126c1, U, put, k, y)) = 13Gc1, Il, x, k, y);
13(put(x, k, y), HI, x2, 12, x3) = put(t2(x, k, y), I, 12(x2, 12, x3);
1301, UI, put (x, k, y), 12, x3) = put(t2¢c1, I, x), k, t2(y, 12, x3));
13Cc1, 11, x2, 12, put(x, k, y)) = put(t2Gc1, I, x2), 12, 12(x, k, y));
tree(put (x, k, y)) = tree(t2(x, k, y));
if(true, x, y) = x; if(false, x, y) = y;
include lessint, equint;
: member(k, x) tests whether the key k occurs in the tree x.
member(k, D = equ(k, D
where | is in atomic_symbols end where;
member(k, eO) = false;
member(k, tree(x)) = member(k, x);
member(k, t2(x1, ll, x2)) =
if(less(k, 11),
member‘(k, x1),
oe
166 16. Implementing Data Structures
iflequ(k, 1),
true,
member(k, x2)));
member(k, t3(Cx1, ll, x2, 12, x3)) =
if(less(k, 11),
member(k, x1),
if(equ(k, 11),
true,
if(less(k, 12),
member(k, x2),
if(equ(k, 12),
true,
member(k, x3))))).
0
The equations of Example 16.3.2 contain a program for inserting a key into a 2-3
tree. Although the equations are very intuitive, they do not obey the restrictions of
Section 5. For example, the second and fifth equations overlap in the expression
tree (ins (3, t2(put (e, 2, ), 4, e0))).
The problem arises conceptually because the described insertion proceeds in two
phases: a traversal from the tree root to the insertion point, followed by a reverse
traversal restructuring the nodes encountered up to the nearest 2-node, or up to the
root if no 2-node is found. The overlap in the expression above corresponds to the
competition between two insertions, one in the restructuring phase, when equation
5 applies, the other in the initial traversal, when equation 2 applies.
The problem can be solved in the traditional manner of setting locks to
prevent the progress of subsequent insertions where they may interfere with previ-
ous updates which are not completed. This is easily done by indicating a locked
node as 12/(...) instead of 12(...), and 23/(...) instead of 23(...). The equations for
this solution are given below. Note that we force complete sequentiality of inser-
16.3. Search Trees and Tables 167
tions, because the root is locked in this solution. Notice also that the tree construc-
tors 12 and 3, which were active in Example 16.3.2, are pure constructors in
Example 16.3.3. The new symbols ¢2/ and 13/ acquire the active role. Since the
symbols ¢2 and #3 may be thought of as being split into active versions, t2/ and
131, and inactive versions t2 and £3, this is another example of the last technique
of Section 12.4 for repairing overlaps, although it was originally thought of in
terms of record locking techniques from concurrent programming.
Example 16.3.3
Symbols
: Constructors for search trees.
tree: 1;
e: 0;
include atomic_symbols, integer_numerals;
: Operations on search trees.
insert: 2;
member: 2;
: Symbols used in the definition of insert.
put: 3;
treel: 1;
t2I: 3;
t3l: 4;
unlock: 1;
: Arithmetic and logical operators.
less: 2;
equ: 2;
or: 2;
ift 3.
For all k, 1, 11, 12, x, x1, x2, x3, y:
: insert(k, x) inserts the key k in the tree x.
insert(k, tree(x)) = treel(insert(k, x));
insert(k, t2(x1, 11, x2)) =
if(less(k, 11), t21(insert(k, x1), 11, x2),
ee
168 16. Implementing Data Structures
if(less(11, k), t21Gc1, LI, insert(k, x2)),
unlock (t2(x1, I, x2))));
insert(k, t3(x1, LI, x2, 12, x3) =
if(or(equ(k, 11), equ(k, 12)), unlock(t3c1, 11, x2, 12, x3)),
if(lesstk, 11), t3l(insert(k, x1), LU, x2, 12, x3),
if(less(k, 12), t31Cx1, 11, insert(k, x2), 12, x3),
131 (x1, U1, x2, 12, insert(k, x3)))));
__ insert(k, e0) = put(eQ, k, eQ);
t21 (put (x, k, y), I, x2) = unlock (t3(x, k, y, U1, x2));
121 (x1, LU ,put(x, k, y)) = unlock (t3(x1, Ll, x, k, y));
131 (put (x, k, y), H, x2, 12, x3) = put(t2(x, k, y), 1, t2c2, 12, x3));
131 (x1, U1, put(x, k, y), 12, x3) = put(t2(x1, U, x), k, t2(y, 12, x3));
131(x1, HI, x2, 12, put(x, k, y)) = put(t2Cc1, U1, x2), 12, 12x, k, y));
treel(put(x)) = tree(x);
t2l(unlock (x1), I, x2) = unlock(t2(x1, Ll, x2));
121(x1, U, unlock (x2)) = unlock (t2(x1, LI, x2));
131 (unlock (x1), 11, x2, 12, x3) = unlock (t3 (x1, 11, x2, 12, x3));
131Cc1, 1, unlock (x2), 12, x3) = unlock (t3(x1, 11, x2, 12, x3);
13101, LU, x2, 12, unlock(c3)) = unlock (t3(x1, LU, x2, 12, x3));
treel(unlock(x)) = tree(XD;
or(true, x) = true; or(false, x) = x;
: The equations for if, equ, less, and member are the same as in Example 16.3.2.
16.3. Search Trees and Tables 169
0
A different solution was proposed in [GS78], eliminating the need for locking
the whole traversal path. The trick is to split the nodes encountered on the down-
ward traversal if they do not permit the insertion of another key without splitting.
Since a 3-node at the root of a 2-3 tree cannot be split on the way down without
destroying the balance (there is no third key available), we must now deal with 2-
3-4 trees, permitting 4-nodes also with three keys /1, /2 and /3, represented by
t4(x1, 11, x2, 12, x3, 13, x4)
The equational program becomes a little more complex, since we have an addi-
tional node type. Nonetheless, 14 equations suffice. The transformations required
to eliminate a 4-node as point of insertion are given by Figure 16.3.3. Mirror
image cases have been omitted.
An equational definition of 2-3-4 tree insertion follows. Note that with this
program we may insert keys in parallel without interference problems. Notice also
that this solution respects the constructor discipline of Section 12.1.
Example 16.3.4
Symbols
: Constructors for search trees.
tree: 1;
e: 0;
include atomic_symbols, integer_numerals;
: Operations on search trees.
insert: 2;
member: 2;
: Symbols used in the definition of insert.
chk2: 5;
chk3: 7;
170 16. Implementing Data Structures
FAPAEA
Xp Xz Xy
Figure 16.3.3
16.3. Search Trees and Tables 171
: Arithmetic and logical operators.
less: 2;
equ: 2;
or: 2;
if: 3.
For all k, 1, 11, 12, x, x1, x2, x3, y:
: insert(k, x) inserts the key k in the tree x.
insert(k, tree(eQ)) = tree(t2(eQ), k, e0));
insert(k, tree(t2(x1, 11, x2))) = tree(insert(k, t2(x1, U1, x2)));
insert(k, tree(t3(c1, 11, x2, 12, x3))) =
tree(insert(k, t3Gc1, U, x2, 12, x3)));
insert(k, tree(t4c1, I, x2, 12, x3, 13, x4))) =
tree(insert(k, 12(12(c1, I, x2), 12, t2(x3, 13, x4))));
insert(k, t2(x1, 1, x2)) =
if(less(k, 11), chk2(k, x1, x2, 11, 1),
if(less (11, k), chk 2k, x2, x1, 11, 2),
12(x1, 11, x2)));
chk2(k, eO, y, 1, =
if(equ(i, 1),
13(eQ, k, eQ, 1, eQ),
13(e0, 1, eQ, k, eO));
chk2(k, t2Gc1, UI, x2), y, Lv =
if(equ(i, 1),
12(insert(k, t2(x1,11,x2)), 1, y),
12(y, L, insert(k, t2(x1, H, x2))));
chk2(k, 13Cc1, U, x2, 12, x3), y, 1 i =
if(equ(i, 1), t2(insert(k, 31, 11, x2, 12, x3), L, y),
12(y, L, insert(k, t3Gc1, U1, x2, 12, x3))));
chk2(k, t4Gc1, 11, x2, 12, x3, 13, x4), y, LD =
if(equ(k, 12),
if(equ(i, 1),
12(t4(x1, I, x2, 12, x3, 13, x4), 1, y),
a 1, 14Gx1, LU, x2, 12, x3, 13, x4))),
if(less(k, 12),
if(equ(i, 1),
13 (insert (k, t2(x1, 11, x2)), 12, t2(x3, 13, x4), L y)
oe L, insert (k, t2(x1, 1, x2)), 12, t2(x3, 13, x4))),
if(equ(i, 1),
172 16. Implementing Data Structures
t3(12Cx1, 11, x2), 12, insert(k, t2(x3, 13, x4)), 1, Di
13(y, 1, (261, I, x2), 12, insert(k, 1203, 13, x4))))));
insert(k, t30c1, lI, x2, 12, x3)) =
if(or(equck, 11), equtk, 12),
t3(x1, I, x2, 12, x3),
if(less(k. 11),
chk3(k, x1, x2, x3, 11, 12, 1),
if(less(k, 12),
chk3(k, x2, x1, x3, LI, 12, 2),
chk3(k, x3, x1, x2, 11, 12, 3);
chk3(k, eO0, x, y, l,m, D =
iflequ(i, v,
t4(eQ), k, eO, 1, e0, m, eQ),
if(equ(i, 2),
14(eQ), 1, eQ, k, eQ, m, eQ),
14(e0, I, eO, m, e0, k, eQ)));
chk3(k, 201, I, x2), x, y, l,m, i) =
if(equ(i, 1),
13 (insert(k, t2(x1, 1, x2)), I, x, m, y),
iffequii, 2),
13(x, L, insert(k, t2Gx1, lI, x2)), m, y),
13(x, 1, y, m, insert(k, t2(x1, 11, x2)))));
chk3(k, t3Cc1, UI, x2, 12, x3), x, y, lm, i) =
if(equ(i, 1),
13 (insert (k, t3Gc1, 11, x2, 12, x3)), 1, x, m, y),
if(equ(i, 2),
13(x, 1, insert(k, 1301, 11, x2, 12, x3), m, y),
t3(x, 1, y, m, insert(k, t3(x1, 11, x2, 12, x3)))));
chk3(k, t4(x1, 11, x2, 12, x3, 13, x4), x, y, 1m, D =
if(equ(k, 12),
ifequ(i, 1),
13(t4(x1, LI, x2, 12, x3, 13, x4), 1, x, m, y),
if(equ(i, 2),
t3(x, 1, 401, 11, x2, 12, x3, 13, x4), m, y),
13(x, 1, y, m, t4Ccl, U1, x2, 12, x3, 13, x4)))),
if(less(k, 12),
if(equ(i, 1),
14 (insert(k, t2(x1, Ll, x2)), 12,
12(X3,L3,X4), L, X, M, Y),
if(equ(i, 2),
14(x, 1, insert(k,t2Cc1, lI, x2)),
12, 1203, 13, x4), m, y),
t4(x, L, y, m, insert(k, t2(c1, lI, x2)),
16.3. Search Trees and Tables 173
12, 126x3, 13, x4),
iflequ(i, 1),
14(t2(x1, I, x2), 12,
insert (k, 1203, 13, x4)),
I, x, m, y),
iflequii, 2 3
14(x, 1, t12Gc1, I, x2), 12,
insert (k, t2(x3, 13, x4), m, y),
t4(x, I, y, m, 121, II, x2), x2,
insert (k, t2(x3, 13, x4;
: The equations for if, equ, less, and member are the same as in Example 16.3.2.
0
Computation using Example 16.3.4 proceeds as follows. Upon encountering a
2-node or a 3-node, the node is locked (with chk2 or chk3) and the proper subtree
for insertion is located. That subtree becomes the second parameter of the function
chk2 or chk3. If the root of that subtree is a 4-node, then the 4-node and the
locked parent are restructured according to the transformations of Figure 16.3.3.
After restructuring, the parent node is released. If the root of the subtree is a 3-
node or a 2-node, then no restructuring is needed, and the parent node is released.
If the subtree is empty, then the locked parent node is a leaf, and we insert the
key. The equations also account for the possibility that the key to be inserted is in
the tree already. Using these equations, we can only attempt inserting a key into a
2-node or a 3-node, thus no upward traversal is needed, and insertions can be done
in parallel
One more variant of balanced search is programmed below, for comparison.
The algorithm expressed by these equations comes from the section on "top-down
algorithms" in [GS78]. The "black nodes" of [GS78] are represented by
node(s,i,t), and the "red nodes" are represented by red (node(s,i,t)). Some of the
174 16. Implementing Data Structures
where clauses restricting substitutions for variables in the equations are semanti-
cally unnecessary, but are required to avoid illegal overlapping of left-hand sides
(see restriction 4 in Section 5).
Example 16.3.5
Symbols
: Constructors for search trees
tree: 1;
node: 3;
red: I;
nil: 0;
: Operations on search trees
insert: 2;
member: 2;
: Symbols used in the definition of insert
inserti: 2;
insertl, insertr: 2;
: Arithmetic and logical operators
ift 3
less: 2;
equ: 2;
include integer_numerals, truth_values.
For all c, i, j, k, l,m, s, t, u,v, w, x, y, Zz
: The symbol tree marks the root of a search tree.
tree(red(s)) = tree(s);
: insert(i, O inserts the integer i into the search tree t.
insert(i, tree(s)) = tree(insert(i, s))
where s is either node(t, j, u) or nilO end or end where;
insert(i, nil0) = red(node(nilO, i, nilO));
insert(t, red(s)) = red(insert(i, s))
where s is either node(t, j, u) or nilO end or end where;
insert(i, node(red(s), j, red(t))) =
red(inserti(i, node(s, j, t)));
16.3. Search Trees and Tables
insert (i, node(s, j, D) = inserti(t, node(s, j, v)
where s, t are either node(u, k, v) or nilQ end or end where,
insert(i, node(s, j, t)) = inserti(i, node(s, j, t))
where s is red(w), t is either node(u, k, v) or nilO end or end where;
insert (i, node(s, j, t)) = inserti(i, node(s, j, t)
where t is red(w), s is either node(u, k, v) or nilO end or end where;
: inserti(i, ) inserts the integer i into the search tree t, assuming that t is a
tree with at least two nodes, and the sons of the root are not both red.
inserti(i, node(s, j, 0) =
if(equii, j), nodets. v,
if less (i, j), insertl(i, node(s, j, 1),
insertr(i, node(s, j, J));
: insertl(i, w) inserts the integer i into the left part of the tree t.
insertl(i, node(red(node(s, j, red(node(t, k, u)))), 1, v)) =
red(inserti(i, node(node(s, j, t), k, node(u, 1, v))))
where s is either node(w, m, x) or nilQ end or end where;
insertl(G, node(s, j, t)) = node(insert(i, s), j, t
where s is either node(u, k, v) or nilO or'red(node(w, I, x))
where w, x are either node(y, m, z) or nilO end or end where
end or end where;
: insertl(i, UD inserts the integer i into the right part of the tree t.
insertr(i, node(s, j, red(node(t, k, red(u))))) =
red(inserti(i, node(node(s, j, tJ, k, u)))
where t is either node(w, m, x) or nilO end or end where;
insertr(i, node(s, j, red(node(red(node(t, k, u)), 1, v)))) =
red(inserti(i, node(node(s, j, t), k, node(u, 1, v))))
where v is either node(w, m, x) or nilO end or end where;
insertr(i, node(s, j, 0) = node(s, j, insert(i, t))
where t is either node(u, k, v) or nilO or red(node(w, I, x))
where w, x are either node(y, m, z) or nilQ end or end where
end or end where;
: member(i, t) is true if and only if the integer i appears in the tree t.
member (i, nil0) = false;
member (i, tree(s)) = member(i, s)
175
176 16. Implementing Data Structures
where s is either node(t, j, u) or nilO end or end where;
member(i, red(s)) = member “(i, s);
member(i, node(s, j, 0) = iflequGi, j), true,
if(less(i, j), member (i, s),
member(i, )));
_if(true,s,0 =s; ifffalse, s, ) = t;
include equint, lessint.
0
Notice that the membership test operator ignores balancing issues. Some
schemes for maintaining balanced search trees attach explicit balancing behavior to
the membership test. Such a technique is undesirable in the equational definitions
of search trees, because the lack of side-effects during evaluation means that any
balancing done explicitly by a membership test will not to benefit the execution of
any other operations. The outermost evaluation strategy used by the interpreter
has the effect of delaying execution of insertions until the resulting tree is used, so
the actual computation is in fact driven by membership tests. It would be nice to
take advantage of the observation in [GS78] that balancing may also be made
independent of insertion, but that approach seems to lead into violations of the
nonoverlapping and left-sequential requirements of Section 5.
17. Sequential and Parallel Equational Computations
This section develops a formal mechanism for distinguishing the sequentializable
equational definitions supported by the current version of the equation interpreter,
from inherently parallel definitions, such as the parallel or. Section 19 goes
further, and shows that sequential systems, such as the Combinator Calculus, can-
not stepwise simulate the parallel or.
17.1. Term Reduction Systems
Term reduction systems, also called subtree replacement systems [O’D77], and
term rewriting systems [HL79], are a formal generalization of the sort of reduction
system that can be defined by sets of equations.
Definition 17.1.1
Let © be a ranked alphabet, with p(a) the rank of a for a€2.
A 2-tree is a tree with nodes labelled from 2.
Zy=lal a is a Z—tree & every node in a labelled by a has exactly p(a) sons}.
0
If 2 is the set of symbols described in the Symbols section of an equational
definition, then Zy is the set of well-formed expressions that may be built from
those symbols.
The following definition is based on Rosen’s notion of rule schemata [Ro73]
(Def. 6.1, p.175).
Definition 17.1.2
Let 2 be ranked alphabet, and let V be an ordered set of nullary symbols not in =
to be used as formal variables. A rule schema is a pair <a,6>, written a—,
178 17. Sequential and Parallel Computations
such that:
(1) a,B€(ZUV) y
(2) af V
(3) Every variable in @ is also in a.
A tule schema a—B is left linear if, in addition, no variable occurs more than once
in a.
Formal variables of rule schemata will be written as fat lower case letters, such as
X,Y,z.
Assume that that the variables in a left linear rule schema a —8 are always chosen
so that x;,°* + ,x,, occur in @ in order from left to right.
If a6 is a rule schema over 2, with variables x,,°+- x,€V, and y),°°* .y,€Zy,
then aly;/x1, °° * ¥_/X_,] ~Bly,/x1,* + * 5Y_/X_] is an instance of a—B.
Let S be a finite set of rule schemata over 2. The system <Zy,—g> is a term
reduction system, where —g is the least relation satisfying
a—c8 if a—6 is an instance of a schema in S;
a—B => yla/x]—gy[8/x] where y€(ZUV)y contains precisely one occurrence
of the variable x €V, and no other variables.
The subscript S is omitted whenever it is clear from context.
0
Every set of S equations following the context-free syntax of the equation inter-
preter defines a term reduction system, in which —g represents one step of compu-
tation. The restrictions on equations from Section 5 induce a special subclass of
term reduction systems.
17.1. Term Reduction Systems 179
Definition 17.1.3
Let S be a set of rule schemata.
S is nonoverlapping if for all pairs of (not necessarily different) schemata a, 8,
and a, —@, the following holds (without loss of generality assuming that the vari-
ables are chosen in some standard order):
Let a —B; and a By be instances of the schemata above, with
ayma,ly, )/x1, ***.Y1m/%m]. Suppose that a,=6,la,/x], where 6 contains pre-
cisely one occurrence of x, 5x. Then there is ani, 1<i<m, such that a=.
Intuitively, S is nonoverlapping if rule schemata may only be overlaid at variable
occurrences, and at the root.
A set of rule schemata S is consistent if, for all pairs of schemata
a; 8), a —B2€S, the following holds. If a—Bi, a—B, are instances of the sche-
mata above, then 6,=f3.
Intuitively, 8, and 8 are the same up to choice of variable names in the two sche-
mata.
If S is a nonoverlapping, consistent set of left-linear rule schemata, then
<2Zy,—g> is a regular term reduction system.
0
This definition, adapted from [O’D77], is essentially the same as Huet and Lévy’s
[HL79] and Klop’s [K180a] (Ch.II), but Klop allows bound variables in expres-
sions, and instead of requiring consistency, Huet-Lévy and Klop outlaw overlap at
the root entirely. The regular reduction systems are those defined by equations
satisfying restrictions 1-4 from Section 5. Restriction 5 is a bit more subtle, and is
180 17. Sequential and Parallel Computations
treated in Sections 17.2 and 17.3.
17.2. Sequentiality
In discussion of procedural programming languages, "sequential" normally means
“constrained to operate sequentially." Reduction systems are almost never con-
strained to operate sequentially, since any number of the redexes in an expression
may be chosen for replacement. Rather, certain reduction systems are constrained
to be interpreted in parallel. So, a sequential reduction system is one that is
allowed to operate sequentially. For example, the equations for the conditional
if(true, x, y) = x; if(false, x,y) = y
allow for parallel evaluation of all three arguments to an if, but they also allow
sequential evaluation of the first argument first, then whichever of the other two
arguments is selected by the first one. The sequential evaluation described above
has the advantage of doing only work that is required for the final answer -- the
parallel strategy is almost certain to perform wasted work on the unselected argu-
ment. By contrast, consider the parallel or, defined by
or(true, x) = true;
or(x, true) = true;
or(false, false) = false
The or operator seems to require parallel evaluation of its two arguments. If
either one is selected as the first to be evaluated, it is possible that that evaluation
will be infinite, and will prevent the unselected argument from evaluating to true.
In the presence of the parallel or, there appears to be no way to avoid wasted
evaluation work in all cases.
It is not obvious how to formally distinguish the sequential reduction systems
17.2. Sequentiality 181
from the parallel ones. Huet and Lévy [HL79] discovered a reasonable technical
definition, based on a similar definition of sequential predicates by Kahn and Plot-
kin [KP78]. The key idea in Huet’s and Lévy’s definition is to regard a redex as a
not-yet-known portion of an expression. Certain portions of an expression, e.g., 7
in if (true, B, y), need not be known in order to determine a normal form. Others,
e.g. a in if (a, B, y), must be known. A sequential term reduction system is one in
which we may always identify at least one unknown position (redex) that must be
known to find a normal form.
In order to discuss unknown portions of an expression, Huet and Lévy use the
nullary symbol w to represent unknowns, > for the relation "more defined than."
Definition 17.2.1 [HL79]
Let a,8€(ZUVU {w}) y.
a> if 8 is obtained from @ by replacing zero or more subexpressions in a with w.
Let S be a set of rule schemata. a€(ZU {w})y is a partial redex if there is a rule
schema a’—*8 € & such that a'2>a.
a is a definite potential redex if there is a rule schema B—y € 2 and a term a’,
such that a'>qa and a’ —*8.
a is root stable if a is not a definite potential redex.
A total normal form for (ZU {w}) yz is a normal form in Dy.
Let a€ (ZU (w})4, and let @ have no total normal form.
An index of a is a term a'€ (ZU {w,x})y with precisely one occurrence of x, such
that a=a'[w/x], and
V 62a (6 has a total normal form) => yw B=a'ly/x).
A regular reduction system is sequential if every term containing w with no total
182 17. Sequential and Parallel Computations
normal form has at least one index.
0
Unfortunately, sequentiality is not a decidable predicate, and even indexes of
sequential systems are not always computable [HL79]. So, Huet and Lévy define a
stronger property called strong sequentiality. Essentially, a reduction system is
strongly sequential if its sequentiality does not depend on the right-hand sides of
rules.
Definition 17.2.2 [HL79]
a—,8 if 8 is the result of replacing some redex in @ with an arbitrarily chosen
term in Zy. Let w€(ZU {w})y, and let there be no total normal form 6 such that
a—,,*5. An index a’ of a is a strong index of a if
V 62a B—,*6 & (6 is a total normal form) => yw B2a'ly/x].
A set of rule schemata is strongly sequential if every term @ containing w such that
a is in (nontotal) normal form, but for no 6 in total normal form does a —,,*5, con-
tains at least one strong index.
a is a potential redex if there is a rule schema B—y € 2 and a term a’, such that
a'2a and a’ —,,*B.
a is strongly root stable if a is not a potential redex.
0
Huet and Levy give a decision procedure to detect strongly sequential systems,
and an algorithm to find a strong index when one exists, or report strong stability
when there is no index. These are precisely the sorts of algorithms needed by the
equation interpreter. The sequentiality detection is used in a preprocessor to detect
and reject nonsequential equations. The other algorithms are used to determine a
traversal order of an input expression, detecting redexes when they are met. When
17.2. Sequentiality 183
an expression is found to be strongly root stable, the root may be output immedi-
ately, and the remaining subexpressions evaluated in any convenient order. Unfor-
tunately, Huet’s and Levy's algorithms are complex enough that it is not clear
whether they have acceptable implementations for the equation interpreter. So, the
current version of the equation interpreter applies the substantially stronger restric-
tion of strong left-sequentiality to guarantee a particularly simple algorithm for
choosing the sequential order of evaluation.
In Section 19, in trying to characterize the nondeterministic computational
power of equationally defined languages, we treat another simplified notion of
sequentiality. Simple strong sequentiality is defined to allow choice of a sequen-
tial order of evaluation by a process with no memory, simply on the basis of order-
ing the arguments to each function symbol independently.
Definition 17.2.3
Let S be a strongly sequential set of rule schemata. S is simply strongly sequential
if there is a sequencing function s:(ZU {w})y (ZU (w,x})y such that, for all par-
tial redexes a and 8, containing at least one w each, s(a) is a strong index in a,
and s (a)[s (8)/x] is a strong index in s (a)[8/x].
0
A system with left-hand sides f(a,a), g(f(b,x)), h(f(x,b)) is strongly sequen-
tial, but not simply strongly sequential, because the sequencing function cannot
choose an index in f (w,w) without knowing whether there might be a g or h above.
17.3. Left-Sequentiality
The strongly left—sequential systems of equations, defined in this section, were
designed to support an especially simple pattern-matching and sequencing algo-
184 17. Sequential and Parallel Computations
rithm in the equation interpreter. Section 18.2.3 presents that algorithm, and
shows that it succeeds on precisely the strongly left-sequential systems.
Definition 17.3.1
A Z—context is a term in (ZU {w}) z.
An instance of a context a is any term or context @ resulting from the replacement
of one or more occurrences of w in a, i.e., B2a.
A left context is a context a such that there is a path from the root of @ to a leaf,
with no occurrences of w on or to the left of the path, and nothing but ws to the
right of the path.
A left—traversal context is a pair <a,l>, where a is a left context, and / is a
node on the path dividing ws from other symbols in a.
An index a’ of a is a strong root index of a if
V Bra B—,*65 & Sis root stable => Fyxw B>a'ly/x]
A redex @ in a term a is essential if a=a'[B/x] where a’ is a strong index of a.
A redex 8 in a term a@ is root—essential if a=a'[B/x] where a' is a strong root
index of a.
0
A context represents the information known about a term after a partial traversal.
The symbol w stands for an unknown portion. A left-traversal context contains
exactly the part of a term that has been seen by a depth-first left traversal that has
progressed to the specified node. —., from Definitions 17.2.2 is the best approxi-
mation to reduction that may be derived without knowing the right-hand sides of
equations. A strong root index is an unknown portion of a term that must be at
least partially evaluated in order to produce a strongly root stable term. Since —,,
allows a redex to reduce to anything, a strong root index must be partially
17.3. Left-Sequentiality 185
evaluated to produce a redex, which may always be w-reduced to a normal form.
In the process of reducing a term by outermost reductions, our short-term goal
is to make the whole term into a redex. If that is impossible, then the term is root
stable, and may be cut down into independent subproblems by removing the root.
Definition 17.3.2
A set of equations is strongly left—sequential if there is a set of left-traversal con-
texts L such that the following conditions hold:
1. For all <a,/> in L, the subtree of a rooted at / is a redex.
2. For all <a,/> in L, 6 an instance of a, / is essential to B.
3. For all left-traversal contexts <a,/> not in L, 8 an instance of a, a root-
essential redex of 8 does not occur at /.
4. Every term is either root stable or an instance of a left context in L.
0
In a strongly left-sequential system, we may reduce a term by traversing it in
preorder to the left. Whenever a redex is reached, the left-traversal context speci-
fying that redex is checked for membership in L. If the left context is in L, the
redex is reduced. Otherwise, the traversal continues. When no left context in L is
found, the term must be root stable, so the root may be removed, and the resulting
subterms processed independently. (1) and (2) guarantee that only essential
redexes are reduced. (3) guarantees that no root-essential redex is skipped. (4)
guarantees that the reduction never hits a dead end by failing to choose any redex.
The analogous property to strong left-sequentiality, using reduction instead of w-
reduction, is undecidable. Notice that strong left-sequentiality, like strong sequen-
tiality, depends only on the left-hand sides of equations, not on the right-hand
sides.
186 17. Sequential and Parallel Computations
Strongly left-sequential sets of equations are intended to include all of those
systems that one might reasonably expect to process by scanning from left to right.
Notice that Definition 17.3.2 does not explicitly require L to be decidable. Also, a
strongly left-sequential system may not necessarily be processed by leftmost-
outermost evaluation. Rather than requiring us to reduce a leftmost redex,
Definition 17.3.2 merely requires us to decide whether or not to reduce a redex in
the left part of a term, before looking to the right. Every redex that is reduced
must be essential to finding a normal form. When the procedure decides not to
reduce a particular redex, it is only allowed to reconsider that choice after produc-
ing a root-stable term and breaking the problem into smaller pieces. Section 18.2.3
shows a simple algorithm for detecting and processing strongly left-sequential sys-
tems. While strongly left-sequential systems are defined to allow a full depth-first
traversal of the term being reduced, the algorithm of Section 18.2.3 avoids search-
ing to the full depth of the term in many cases by recognizing that certain sub-
terms are irrelevant to choosing the next step.
18. Crucial Algorithms and Data Structures for Processing
Equations
Design of the implementation of the equation interpreter falls naturally into four
algorithm and data structure problems.
1. Choose an efficient representation for expressions.
2. Invent a pattern-matching algorithm to detect redexes.
3. Invent an algorithm for selecting the next redex to reduce.
4. Choose an algorithm for performing the selected reduction.
The four subsections of this section correspond roughly to the four problems above.
For sequential equations, the choice of the next redex to reduce is intimately tan-
gled with pattern matching, so the sequencing problem is treated in Section 18.2,
along with pattern matching. For inherently parallel equations, an additional
mechanism is required to manage the interleaved sequence of reductions. This
mechanism has not been worked out in detail, but what we know of it is described
in Section 18.3. The current version of the equation interpreter does the most obvi-
ous procedure for performing reductions. Section 18.4 mentions some future
optimizations to be tested for that problem.
18.1. Representing Expressions
Because reduction to normal form involves repeated structural changes to an
expression, some sort of representation of tree structure by pointers seems inescapa-
ble. There are still a number of options to consider, and we are only beginning to
get a feeling for which is best. The most obvious representation of expressions is
similar to the one used in most implementations of LISP to represent S-expressions.
Each function and constant symbol occupies one field of a storage record, and the
188 18. Crucial Algorithms
other fields contain pointers to the representations of the arguments to that func-
tion. It is quite easy in this representation to avoid multiple copies of common
subexpressions, by allowing several pointers to coalesce at a single node. Thus,
although the abstract objects being represented are trees, the representations are
really directed acyclic graphs. Figures 18.1.1 and 18.1.2 show the expression
h(f(g(a,b), g(a,b)), g(a,b))
with and without sharing of the subexpression g(a,b).
Figure 18.1.1
Unlike LISP, whose nodes are all either nullary or binary (0 or 2 pointers),
the equation interpreter requires as many pointers in a node as the arity of the
function symbol involved. The issue of how best to handle the variety of arities has
not been addressed. Presumably, some small maximum number of pointers per
physical node should be chosen, perhaps as small as 2, and virtual nodes of higher
18.1. Representing Expressions 189
Figure 18.1.2
arities should be linked together from the smaller physical nodes, although
varying-size physical nodes are conceivable as a solution. The best choice of physi-
cal node size, as well as the structure for linking them together (probably a linear
list, but possibly a balanced tree) can best be chosen after experience with the
interpreter yields some statistics on its use of storage. Currently, we simply allo-
cate a node size sufficient to handle the highest arity occurring in a set of equa-
tions. This is certainly not the right solution in the long run, but since storage lim-
its have not been a limiting factor in our experimental stage, it was a good way to
get started.
Depending on the pattern-matching and sequencing algorithms chosen, it may
be necessary to keep pointers from sons to fathers in the representation of an
expression, as well as the usual father-son pointers. The most obvious solution --
190 18. Crucial Algorithms
one back pointer per node -- does not work, because the sharing of subexpressions
allows a node to have arbitrarily many fathers. Some sort of linked list of fathers
seems unavoidable. An elegant representation avoids the use of any extra nodes to
link together the fathers. Both the son-father and father-son links are subsumed by
a circular list of a son and all of its fathers. When a father appears on this list, it
is linked to the next node through a son pointer, corresponding to the argument
position held by the unique son on that list with respect to that particular father.
The unique son on the list is linked through a single extra father pointer. Figure
18.1.3 shows the expression
AG (g (a,b), 2 (a,b)),g¢ (a,b))
again, with the g(a,b) subexpressions shared, represented with circular father-son
lists. Since each node may participate in a number of circular lists equal to its
arity, the pointers must actually point to a component of a node, not to the node as
a whole. Notice that, in order to get from a father to its ith son, it is necessary to
follow the circular list going through the father’s ith son pointer until that list hits
the component corresponding to a father pointer. The node containing the unique
father pointer in the circular list is the unique son on that list. The possibilities for
breaking a virtual node of high arity into a linked structure of smaller physical
nodes are essentially the same as with the LISP-style representation above, except
that the linkage within a virtual node must also be two-way or circular. The
pattern-matching algorithm used in the current version of the equation interpreter
does not require back pointers, so the simpler representation is used. An earlier
version used back pointers, and we have not had sufficient experience with the
interpreter to rule out the possibility of returning to that representation in a later
version.
18.2. Pattern Matching and Sequencing 191
Figure 18.1.3
18.2. Pattern Matching and Sequencing
The design of the equation interpreter was based on the assumption that the time
overhead of pattern matching would be critical in determining the usability of the
interpreter. Each visit to a node in the equation interpreter corresponds roughly to
one recursive call or return in LISP, so the pattern-matching cost per node visit
must compete with manipulation of the recursion stack in LISP. So, we put a lot
of effort into clever preprocessing that would allow a run-time pattern-matching
cost of a few machine instructions per node visit. By contrast, most implementa-
tions of Prolog use a crude pattern-matching technique based on sequential search,
and depend on the patterns involved being simple enough that this search will be
acceptably quick. An indexing on the leftmost symbol of a Prolog clause limits the
search to those clauses defining a single predicate, but even that set may in princi-
192 18. Crucial Algorithms
ple be quite large. Prolog implementations were designed on the assumption that
unification [Ko79b] (instantiation of variables) is the critical determinant of perfor-
mance. Only an utterly trivial sort of unification is used in the equation inter-
preter, so our success does not depend on that problem.
We do not have sufficient experience yet to be sure that the pains expended on
pattern matching will pay off, but if equational programming succeeds in providing
a substantially new intuitive flavor of programming, extremely efficient pattern
matching is likely to be essential. Pattern matching based on sequential search
allows the cost of running a single step of a program to grow proportionally to the
number of equations and complexity of the left-hand sides. This growth
discourages the use of many equations with substantial left-hand sides. Equational
programs with only a few equations, with simple left-hand sides, tend to be merely
syntactically sugared LISP programs, and therefore not worthy of a new implemen-
tation effort when so many good LISP processors are already available.
All of our pattern-matching algorithms are based on the elegant string-
matching algorithm using finite automata by Knuth, Morris, and Pratt [KMP77],
and its extension to multiple strings by Aho and Corasick [AC75]. The essential
idea is that a set of pattern strings may be translated into a finite automaton, with
certain states corresponding to each of the patterns. When that automaton is run
on a subject string, it enters the accepting state corresponding to a given pattern
exactly whenever it reaches the right end of an instance of that pattern in the sub-
ject. Furthermore, the number of states in the automaton is at most the sum of the
lengths of the patterns, and the time required to build it is linear (with a very low
practical overhead) in the lengths as well. In particular, the automaton always
contains a tree structure, called a trie, in which each leaf corresponds to a pattern,
18.2, Pattern Matching and Sequencing 193
and each internal node corresponds to a prefix of one or more patterns. As long as
the subject is matching some pattern or patterns, the computation of the automaton
goes from the root toward the leaves of that tree. The clever part of the algorithm
involves the construction of backward, or failure, transitions for those cases where
the next symbol in the subject fails to extend the pattern prefix matched so far. A
finite automaton, represented by transition tables, is precisely the right sort of
structure for fast pattern matching, since the processing of each symbol in the sub-
ject requires merely a memory access to that table, and a comparison to determine
whether there is a match.
Tree pattern matching is different enough from the string case that explora-
tion of several extensions of the Aho-Corasick technique to trees consumed sub-
stantial effort in the early stages of the interpreter project. Those efforts are
described more fully in [HO82a, HO84], and are merely summarized here. In addi-
tion to the problems arising from working with trees instead of strings, we must
face the problem of incremental execution of the pattern matcher after each reduc-
tion step. It is clearly unacceptable to reprocess the entire expression that is being
reduced after each local reduction step, so our pattern-matching algorithm must be
able to pick up some stored context and rescan only the portion affected by each
change.
A decisive factor in the final choice of a pattern matcher turned out to be its
integration with the sequencer. Although not so decisive in the design of the
current implementation, the added complication of dealing with disjunctive where
clauses restricting the substitutions for variables may ruin an algorithm that runs
well on patterns with unrestricted variable substitutions.
194 18. Crucial Algorithms
18.2.1. Bottom-Up Pattern Matching
Three basic approaches were found, each allowing for several variations. The first,
and perhaps most obvious is called the bottom—up method, because information
flows only from the leaves of a tree toward the root. Each leaf is assigned a match-
ing state corresponding to the constant symbol appearing there. Each function
‘symbol has a table, whose dimension is equal to the arity of the symbol, and that
table determines the matching state attached to each node bearing that function
symbol, on the basis of the states attached to its sons. Every matching state may
be conceived as representing a matching set of subtrees of the given patterns that
can all match simultaneously at a single subtree of a subject. In particular, certain
states represent matches of complete patterns. In the special case where every
function symbol has arity 1, the bottom-up tables are just the state-transition tables
for the Aho-Corasick string matching automaton [HO82a].
Example 18.2.1.1
Consider the pattern f (g (hk (x), h(a)), hGy)), with variables x and y. The match-
ing sets and states associated with this pattern are:
1: {x,y}
2: {x,y, a)
3: (x,y, A(x), hy}
4: {x,y, A(x), AG), h(@}
5: {x,y, gh), hy), h(a))}
6: {x,y, f(g(h(x), h(a)), h(y))}
Set 6 indicates a match of the entire pattern. Assuming that there is one more
nullary symbol, b, the symbols correspond to tables as follows:
18.2.1. Bottom-Up Pattern Matching 195
1} 1112111
2) 111111
3} 111111
4,1 11111
5|116611
6,111111
1} 111111
2; 1111211
3}111511
44111511
S}1112111
6,1 11111
Figure 18.2.1.1 shows the matching states assigned to all nodes in the tree
representing the expression
AGES GAG), h(a)), h(b)), f (eh), h(O)), h(a).
0
The bottom-up method is ideal at run time. An initial pass over the input
expression sets all of the match states, which are stored at each node. After a
reduction, the newly created right-hand side symbols must have their match states
computed, plus a certain number of nodes above the point of the reduction may
have their match states changed. The length of propagation of changes to states is
at most the depth of the deepest left-hand side, and is usually shorter than that.
196 18. Crucial Algorithms
Figure 18.2.1.1
18.2.1. Bottom-Up Pattern Matching 197
Because of sharing, there may be arbitrarily many paths toward the root along
which changes must be propagated. The more complex representation of expres-
sions allowing traversal from sons to fathers as well as fathers to sons must be used
with the bottom-up method.
Unfortunately, the bottom-up method gets into severe trouble with the size of
the tables, and therefore with the preprocessing time required to create those tables
as well. There are two sources of explosion:
1. A symbol of arity m requires an n-dimensional table of state transitions.
Thus, the size of one symbol’s contribution to the tables is s", where s is the
number of states.
2. In the worst case, the number of the states is nearly 2?, where p is the sum of
the sizes of all the patterns (left-hand sides of equations).
The exponential explosion in number of states dominates the theoretical worst case,
but never occurred in the two years during which the bottom-up method was used.
In [HO82a] there is an analysis of the particular properties of patterns that lead to
the exponential blowup. Essentially, it requires subpatterns that are incomparable
in their matching power -- subjects may match either one, both, or neither. For
example, f(x,a) and f(a,x) are incomparable in this sense, because f (b,a)
matches the first, f (a,b) matches the second, f(a,a) matches both, and f (b,b)
matches neither. Such combinations of patterns are quite unusual in equational
programs.
The theoretically more modest increase in table size due to the arity of sym-
bols had a tremendous impact in practice. By suitable encodings, such as Currying
(see Section 4.4), the maximum arity may be reduced to 2. In principle, reducing
the arity could introduce an exponential explosion in the number of states, but this
198 18. Crucial Algorithms
never happened in practice. Unfortunately, for moderate sized equational pro-
grams, such as those used in the syntactic front end of the equation interpreter
itself, even quadratic size of tables is too much, leading to hour-long preprocessing.
Tables may be compressed, by regarding them as trees with a level of branching
for each dimension, and sharing equivalent subtrees. The resulting compression is
‘quite substantial in practice, giving the appearance of a linear dependence with
constant factor around five or ten. The time required to generate the large tables
and then compress them is still unacceptable. For a short time, we used a program
that produced the compressed tables directly, but it required a substantial amount
of searching for identical subtrees, and the code was so complicated as to be
untrustworthy. Cheng, Omdahl, and Strawn (Iowa State University) have made
an extensive experimental study of several techniques for improving the bottom-up
tables for a particular set of patterns arising from APL idioms. Their work sug-
gests that a carefully hand-tuned bottom-up approach may be good for static sets
of patterns, but substantial improvement in the algorithm is needed to make it a
good choice when ever-changing pattern sets require completely automatic prepro-
cessing.
The problems with table size alone may well have killed the bottom-up
method, but more trouble arose when we considered the sequencing problem. The
only bottom-up method for sequencing that we found was to keep track simultane-
ously of the set of subpatterns matched by a node (the match ser), and the set of
subpatterns that might come to match as a result of reductions at descendants of
the node (the possibility set). The reduction strategy would be driven by an
attempt to narrow the difference between the match set and the possibility set at a
node. When no redex appears in the possibility set at a node, that node is stable,
18.2.1. Bottom-Up Pattern Matching 199
and may be output. A precise definition of possibility sets appears in the appendix
to [HO79], and may also be derived from [HL79]. Possibility sets, just as match
sets, may be enumerated once during preprocessing, then encoded into numerical
state names. Possibility sets may explode exponentially, even in cases where the
match sets do not. Possibility sets were never implemented for the equation inter-
preter, and the bottom-up versions actually performed a complete reduction
sequence, by adding each redex to a queue as it was discovered, then reducing the
one at the head of the queue - a very wasteful procedure. Bottom-up pattern
matching in the equation interpreter will probably not be resurrected until a
simpler sequencing technique is found for it.
A final good quality of the bottom-up method, although not good enough to
save it, is its behavior with respect to ors in where clauses. A left-hand side of an
equation in the form
E =F where x is either G, or --+ or G, end or end where
may be treated as a generalized pattern of the form Elor(G,,-::,G,)/x] (the
special expression or (G,,°-~-,G,) is substituted for each occurrence of x). A pat-
tern of the form or(G,,---,G,) matches an expression when any one of
G,,°**,G, matches. The subpatterns involved in these ors have no special impact
on the number of states for bottom-up matching.
18.2.2. Top-Down Pattern Matching
The second approach to pattern-matching that was used in the equation interpreter
is the top—down approach. Every path from root to leaf in a pattern is taken as a
separate string. Numerical labels indicating which branch is taken at each node
are included, as well as the symbols at the nodes. Variable symbols, and the
200 18. Crucial Algorithms
branches leading to them, are omitted. If one string is a prefix of another one
associated with the same pattern, it may be omitted.
Example 18.2.2.1
The expression f (g(a, x), h(y, b)), whose tree form is shown in Figure 18.2.2.1,
produces the strings flgla, flg, f2hla, f2h2b. The string f1g may be omitted,
“because it is a prefix of flgla.
Figure 18.2.2.1
0
The Aho-Corasick algorithm may be applied directly to the set of strings derived in
this way. Then, in a traversal of a subject tree, we may easily detect the leaf-ends
of all subject paths that match a pattern path. By keeping states of the automaton
on the traversal stack, we may avoid restarting the automaton from the root for
each different path.
The hard part of the top-down method is correlating all the individual path
18.2.2. Top-Down Pattern Matching 201
matches to discover complete matches of patterns. We tried two ways of doing
this. In the first, a counter for each pattern is associated with each node on the
recursion stack. Each time a path is matched, the appropriate counter at the root
end is incremented. If the traversal stack is stored as an array, rather than a
linked list, the root end may be found in constant time by using a displacement in
the stack. Whenever a counter gets up to the number of leaves in the appropriate
pattern, a match is reported. In the worst case, every counter could go up nearly
to the number of leaves in the pattern, leading to a quadratic running time. In a
short experiment with this method, carried out by Christoph Hoffmann’s advanced
compiler class in 1982, such a case was not observed. [HO82a] analyzes the quali-
ties of patterns that lead to good and bad behavior for the top-down method with
counters, but two other problems led to its being abandoned.
After a change in the subject tree, resulting from a reduction step, the region
above the change must be retraversed, and the counters corrected. This requires
that the old states associated with the region be saved for comparison with the new
ones, since only a change from acceptance to rejection, or vice versa, results in a
change to a counter. This retraversal was found to be quite clumsy when it was
tried. Also, no sequencing method for top-down pattern matching with counters
was ever discovered.
Another variation on top-down pattern matching, using bit strings to correlate
the path matches, led to much greater success. Each time a path matches, a bit
string is created with one bit for each /Jevel of the pattern. All bits are set to 0,
except for a 1 at the level of the leaf just matched. These bit strings are combined
in a bottom-up fashion, shifting as they go up the tree, and intersecting all the bit
strings at sons of a node to get the bit string for that node. When a 1 appears at
202 18. Crucial Algorithms
the root level, a match is detected. The details are given in [HO82a].
Example 18.2.2.2
Consider again the pattern f(g (h(x), A(a)), A(y)) from Example 18.2.1.1, run-
ning on the expression
ASS (gh (a), h(a)), h(b)), f(g (A (5), h(6)), h(a))).
Figure 18.2.2.2 shows the tree representation of this expression. A * is placed at
the leaf end of each path matching a root-to-leaf path in the pattern. Each node is
annotated with the bit string that would be computed for it in the bottom-up por-
tion of the matching.
0
Careful programming yields an algorithm in which the top-down and bottom-up
activities are combined in a single traversal, and bit strings, as well as automaton
States, are only stored on the traversal stack, not at all nodes of the tree. Multiple
patterns are handled conceptually by multiple bit strings. It is easy, however, to
pack all of the strings together, and perform one extra bitwise and with every shift
to prevent bits shifting between the logical bit strings.
The top-down method with bit strings adapts well to incremental matching.
After a change in the tree, merely reenter the changed portion from its father. The
traversal stack still contains the state associated with the father, and the original
processing of the changed portion has not affected anything above it, so there is no
special retraversal comparable to the bottom-up method, or the top-down method
with counters. Sequencing is handled by one more bit string, with one bit for each
level of the pattern. In this string, a bit is 1 if there is a possible match at that
level. The two bit strings used by the top-down method, in fact, correspond to the
203
18.2.2, Top-Down Pattern Matching
8899 88989
welLs KeLe |
Figure 18.2.2.2
204 18. Crucial Algorithms
match sets and possibility sets of the bottom-up method. Since the top-down pro-
cessing has already determined which root-to-leaf paths of a pattern are candidates
for matching at a given point, the match and possibility sets need only deal with
subpattern positions along a single path. Thus, instead of precomputing sets of
subpatterns, then numbering them, we may store them explicitly as bit strings,
“using simple shifts and bitwise ands and ors to combine them. Such a technique
did not work for the bottom-up method, because the shifts would become compli-
cated shuffle operations, depending on the tree structure of the patterns.
The top-down pattern matching techniques do not perform as well as the
bottom-up method with respect to disjunctions in where clauses restricting the sub-
Stitutions for variables in equations. No better strategy has been found than to
treat each disjunctive pattern as a notation for its several complete alternatives.
For example,
F(x, y) = g(x) where x is either h(u) or a end or,
y is either h() or b end or
end where
is equivalent to the four equations
Sth(w), hW)) = gth();
Sh), b) = gth(w);
fla, hO) = gla);
S(a, b) = g(a)
Notice that the effect is multiplicative, so a combinatorial explosion may result
from several disjunctive where clauses in the same equation. Such an explosion
only occurred once in our experience with the interpreter, but when it did, it was
disastrous. At the end of Section 18.2.3 we discuss briefly the prospects for avoid-
18.2.2. Top-Down Pattern Matching 205
ing the disjunctive explosion without returning to bottom-up pattern matching, with
its own explosive problems.
18.2.3. Flattened Pattern Matching
The final pattern-matching method, used in the current version of the interpreter,
uses only one string per tree pattern. Tree patterns are flattened into preorder
strings, omitting variables. The Aho-Corasick algorithm [AC75] is used to produce
a finite automaton recognizing those strings. Each state in the automaton is anno-
tated with a description of the tree moves needed to get to the next symbol in the
string, or the pattern that is matched, if the end of the string has been reached.
Such descriptions need only give the number of edges (20) to travel upwards
toward the root, and the left-right number of the edge to follow downwards. For
example, the patterns (equation left-hand sides) f(f(a,x),g(a,y)) and g(x,5)
generate the strings ffaga and gb, and the automaton given in Figure 18.2.3.1.
The automaton cannot be annotated consistently if conflicting moves are asso-
ciated with the same state. Such conflicts occur precisely when there exist preorder
flattened strings of the forms afy and 86, such that the annotations on the last
symbol of 6 in the two strings are different. These differences are discovered
directly by attempts to reassign state annotations in the automaton when a is the
empty string, and by comparing states at opposite ends of failure edges when a is
not empty. Fortunately, the cases in which the flattened pattern-matching automa-
ton cannot be annotated properly are exactly the cases in which equations violate
the restrictions of Section 5. When y and 6 are not empty, the conflicting annota-
tions are both tree moves, and indicate a violation of the left-sequentiality restric-
tion 5. When one of y,6 is the empty string, the corresponding annotation reports a
match, and there is a violation of restriction 3 or 4. In the example above, there is
206 18. Crucial Algorithms
———-> forward edge
——-—-» failure edge
failure edges not shown all lead to the start state
fu means move up one level in the tree
d1 means move down to son number 14
m1 means a match of pattern number 1
Figure 18.2.3.1
a conflict with e=ffa, B=g, y=a, d=b. That is, after scanning ffag, the first pat-
tern directs the traversal down edge number 1, and the second pattern directs the
traversal down edge number 2. This conflict is discovered because there is a failure
edge between states with those two annotations.
The restriction imposed on equations by the pattern-matching strategy above
may be justified in a fashion similar to the justification of deterministic parsing
18.2.3. Flattened Pattern Matching 207
strategies. That is, we show that the algorithm succeeds (generates no conflicts)
on every set of equations that is strongly left-sequential according to the abstract
definition of strong left-sequentiality in Section 17.3.
Theorem 18.2.3.1
The flattened pattern-matching algorithm succeeds (i.e., generates no conflicts) if
and only if the input patterns are left-hand sides of a regular and strongly left-
sequential set of equations.
Proof sketch:
(=>) If the pattern matching-automaton is built with no conflicts, then L of
Definition 17.3.2 may be defined to be the set of all left-traversal contexts <x,/>
such that / is the root of a redex in x, and / is visited by the automaton, when
Started at the root of x.
(<=) If a conflict is found in the pattern-matching automaton, then there are two
flattened preorder strings aBy and B6 derived from the patterns, with conflicting
tree moves from 8 to y and from B to 6. Without loss of generality, assume that
there are no such conflicts within the two occurrences of 8. af, with its associated
tree moves, defines a context x, which is the smallest left context allowing the
traversal specified by af. @ defines a smaller left-traversal context p in the same
way. p is contained as a subterm in x, in such a way that the last nodes visited in
the two traversals coincide. If one or both of y, 6 is empty, then x demonstrates a
violation of restriction (4) or (3), respectively. So, assume that -y, 6 are not empty,
and the annotations at the ends of the fs are both tree moves.
Consider the two positions to the right of x specified by the two conflicting traver-
sal directions for af and 8. Expand x to p by filling in the leftmost of these two
positions with an arbitrary redex, and let m be the root of this added redex. Let
208 18. Crucial Algorithms
equ, be the equation associated with whichever of a®y, 86 directed traversal
toward this leftmost position, and let equ. be the equation associated with the
remaining one of afy, 85. <,n> cannot be chosen in L, because there is an
instance y' of y in which a redex occurs above n matching the left-hand side of
equ», and y' may be w-reduced to normal form at this redex, without reducing the
yedex at n in y. <y,n> cannot be omitted from L, because there is another
instance y" of y in which everything but m matches the redex associated with equ,
and n is therefore root-essential to py".
For example, the pair of equation left-hand sides f (g(x,a),y) and g(b,c) have the
preorder strings fga and gbc. A conflict exists with a=f, B=g, y=a,d=c. The
first equation directs the traversal down edge 2 after seeing fg, and the second
equation directs it down edge 1. The conflicting prefixes fg and g produce the con-
text f(g (w,w),w). The context above is expanded to the left-traversal context con-
sisting of f (g(g(b,c),w),w) with the root of g(b,c) specified. This left-traversal
context cannot be chosen in L (ice., it is not safe to reduce the redex g(b,c) in this
case), because the leftmost w could be filled in with a to produce
S (g(g(b,c),a),w), which is a redex of the form f(g(x,a),y), and can be w-
reduced to normal form in one step, ignoring the smaller redex g(b,c). But, this
left-traversal context may not be omitted from L (i.e., it is not safe to omit reduc-
ing g(b,c)), because the leftmost w may also be filled in with c to produce
S (g(g(b,c),c),w), and reduction of g(b,c) is essential to get a normal form or a
root-stable term.
The flattened pattern-matching method is so simple, and so well suited to the
precise restrictions imposed for other reasons, that it was chosen as the standard
18.2.3. Flattened Pattern Matching 209
pattern matcher for the equation interpreter. In fact, this method developed from
an easy way to check the restrictions on equations, which was originally intended to
be added to the top-down method. As long as the interpreter runs on a conventional
sequential computer, the flattened pattern-matching method will probably be used
even when the interpreter is expanded to handle nonsequential sets of equations. In
that case, conflicting annotations will translate into fork operations. By exploiting
sequentiality insofar as it holds, the overhead of keeping track of multiple processes
will be minimized. An interesting unsolved problem is to detect and exploit more
general forms of sequentiality than left-sequentiality in an efficient manner. There
are likely to be NP-hard problems involved.
The flattened method has the same shortcoming as the top-down method with
respect to disjunctive where clauses. The current version merely treats a qualified
pattern as if it were several simple patterns, causing an exponential explosion in the
worst case. Brief reflection shows that the disjunctive patterns are, in effect,
specifications of nondeterministic finite automata. The current version translates
these into deterministic automata, preserving whatever portions are already deter-
ministic in an obvious way. Usually the translated automaton is far from minimal.
In most cases, there exists a deterministic automaton whose size is little or no
greater than the nondeterministic one. A good heuristic for producing the minimal,
or a nearly minimal, deterministic automaton directly, without explicitly creating
the obvious larger version, would improve the practicality of the interpreter
significantly. We emphasize the fact that this technique would probably only be
heuristic, since the problem of minimizing an arbitrary nondeterministic finite
automaton, or producing its minimal deterministic equivalent, provides a solution to
the inequivalence problem for nondeterministic finite automata, which is
210 18. Crucial Algorithms
PSPACE-complete [GJ79].
We have discussed the incremental operation of each pattern matcher at
matching time, as changes are made to the subject. As a user edits an equational
program, particularly with the modular constructs of Section 14, it is desirable to
perform incremental preprocessing as well, to avoid reprocessing large amounts of
unchanged patterns because of small changes in their contexts. Robert Strandh is
currently studying the use of suffix (or position) trees [St84] to allow efficient
incremental preprocessing for the Aho-Corasick automaton, used in both the top-
down and flattened pattern matchers. We have an incremental processor that
allows insertion and deletion of single patterns for a cost linear in the size of the
change, independent of the size of the automaton. We hope to develop a processor
to combine two sets of patterns for a cost dependent only on their interactions
through common substrings, and not on their total sizes.
18.3. Selecting Reductions in Nonsequential Systems of Equations
In principle, nonsequential equational programs, such as those including the paral-
lel or, may be interpreted by forking off a new process each time the flattened
pattern-matching automaton indicates more than one node to process next. The
overhead of a general-purpose multiprocessing system is too great for this applica-
tion. So, an implementation of an interpreter for nonsequential equations awaits a
careful analysis of a highly efficient algorithm and data structure for managing
these processes. Several interesting problems must be handled elegantly by this
algorithm and data structure. First, whenever a redex @ is found, all processes
seeking redexes within a must be killed, since the work that they do may not make
sense in the reduced expression. This problem requires some structure keeping
track of all of the processes in certain subtrees. Substantial savings may be
18.3. Selecting Reductions 211
gained by having that structure ignore tree nodes where there is no branching of
parallel processes.
Because of the sharing of identical subexpressions, two processes may wander
into the same region, requiring some locking mechanism to avoid conflicts. It is not
sufficient to have a process merely lock the nodes that it is actually working on,
since the result would be that the locked out process would follow the working pro-
cess around, duplicating a lot of traversal effort. On the other hand, it is too much
to lock a process out of the entire path from the root followed by another process,
since the new process might be able to do a reduction above the intersection of
paths, as a result of information found within the intersection.
Example 18.3.1
Consider the equations
S(gth(x, y))) = x;
or(true, x) = true; or(x, true) = true
Suppose that we are reducing an intermediate expression of the form
or (f (g(h (true ,a))), g(h (true,a))), with the common subexpression g(h (true ,a))
shared. At the or node, a process A will start evaluating the left-hand subexpres-
sion f (g(h (true,x))), and another process B will start evaluating the right-hand
subexpression g(h(true,a))). Perhaps process B reaches the common subexpres-
sion g(h (true,a)) before A, and eventually transforms a into a’. If A waits at the
J node, it will miss the fact that there is a redex to be reduced. A needs to go into
the common subexpression just far enough to see the symbols g, h, and true.
Then, f(g (h (true,c'))) will be replaced by true, yielding or (true, a’) which
reduces to true. At this point, process B is killed. If A waits on B, it may wait
arbitrarily long, or even forever. Figure 18.3.1 shows the expression discussed
Se en PY
Se
212 18. Crucial Algorithms
above in tree form, with the interesting positions of A and B.
0
So, a process A wandering onto the path of process B must continue until it
reaches the same state as B. At that point, A must go to sleep, until B returns to
the node where A is. These considerations require a mechanism for determining
sina processes have visited a node, and in what states, and what processes are
sleeping at a given node.
Of course, a low-overhead sequencer for any set of equations could be devised,
by adding each newly discovered redex to a queue, and always reducing the head
redex in that queue. The resulting complete reduction sequence [O’D77] is very
wasteful of reduction effort, and probably would not be acceptable except as a tem-
porary experimental tool.
18.4. Performing a Reduction Step
At first, the implementation of reduction steps themselves seems quite elementary.
Simply copy the right-hand side of the appropriate equation, replacing variable
instances by pointers to the corresponding subexpressions from the left-hand side.
Right-hand side variables may easily be replaced during preprocessing by the
addresses of their instances on the left-hand side, so that no search is required.
For example, (f(g(a,x),y) =h (x,y) may be represented by
S (g(a, 7), ?) = h(<1,2>,<2>), indicating that the first argument to h is the
2nd son of the Ist son of the redex node, and the second argument is the 2nd son
of the redex node. Given these addresses, we no longer need the names of the vari-
ables at all. The symbol h may be overwritten on the physical node that formerly
contained the symbol f. One small, but crucial, problem arises even with this
18.4. Performing a Reduction
Figure 18.3.1
213
214 18. Crucial Algorithms
simple approach. If there is an equation whose right-hand side consists of a single
variable, such as car[(x . y)] = x, then there is no appropriate symbol to overwrite
the root node of the redex (in this case containing the symbol car). Consider the
expression f[car[(a . 6)]], for example, which reduces to fla]. We could modify
the node associated with f, so that its son pointer goes directly to a, but then any
other nodes sharing the subexpression car[(a . 8)] would not benefit by the reduc-
tion. Finding and modifying ail of the fathers of car[(a@ . 8)], on the other hand,
might be quite expensive. The solution chosen [O’D77] is to replace the car node
by a special dummy node, with one son. Every time a pointer is followed to that
dummy node, it may be redirected to the son of the dummy. Eventually, the
dummy node may become disconnected and garbage collected for reuse. In effect,
this solution requires a reduction-like step to be performed for every pointer to the
redex car[(« . 6)], but that step is always a particularly simple one, independent of
the complexity of the redex itself.
As the left-hand sides of equations become more substantial, the collecting of
pointers to variable instances, by the simple technique above, might become non-
trivially costly. It appears that there are some simple variations in which pointers
to variables would be collected during the pattern-matching traversal. A careful
analysis is needed to determine whether the savings in retraversal costs would pay
for the extra cost of maintaining pointers to variable positions in partially matching
subexpressions, when the matching fails later on. In any case, the retraversal for
variables could certainly be improved to collect all variables in a single retraversal,
rather than going after each variable individually. Somewhat more sophisticated
optimizations would include performing some of the pattern-matching work for
right-hand sides of equations during the preprocessing, and coalescing several
18.4. Performing a Reduction 215
reduction steps into one. Substantial savings might also be achieved by reusing
some of the structure of a left-hand side in building the right. In particular the
common form called tail recursion, in which a recursively defined function appears
only outermost on a right-hand side, may be translated to an iteration by these
means. For example, f(x) = if (p(x), a, f(g(x))) may be implemented by
coalescing the evaluation of an application of f with the subsequent evaluation of
if, and reusing the f node by replacing its argument, instead of the f node itself.
The OBJ project [FGJM85] has achieved substantial speedups with optimizations
of this general sort, but their applicability to the equation interpreter has never
been studied.
Another opportunity for improving performance arises when considering the
strategy for sharing of subexpressions. The minimal sharing strategy should be to
share all substitutions for a given variable x on the right-hand side of an equation.
In fact, it is easier to program an interpreter with this form of sharing than one
that copies out subterms for each instance of x. A natural improvement, not
implemented yet, is to discover all shareable subterms of a right-hand side, during
the preprocessing step. For example, in f(x, y) = g(h(x, y), A(x, y)), the entire
subexpressions h(x, y) may be shared, as well as the substitutions for x and y.
Such an optimization could be rather cheap at preprocessing time, using a variation
of the tree isomorphism algorithm [AHU74], and need not add any overhead at
run time. More sophisticated strategies involve dynamic detection of opportunities
for sharing at run time, trading off overhead against the number of evaluation steps
required.
A simple sort of dynamic sharing is implemented in the current version of the
interpreter. This technique is based on the hashed cons idea from LISP, and was
216 18. Crucial Algorithms
proposed and implemented for the equation interpreter by Paul Golick. Whenever
a new storage node is created during reduction, its value (including the symbol and
all pointers) is hashed, and any identical node that already exists is discovered.
Upon discovery of an identical node, the new one is abandoned in favor of another
pointer to the old one. In order to maximize the benefit of this strategy, reductions
are performed, not by immediate replacement of a left-hand side by a right, but by
setting a pointer at the root of a redex to point to its reduced form. As discovered
during the traversal of an expression, pointers to redexes are replaced by pointers
to the most reduced version created so far. This use of reduction pointers sub-
sumes the dummy nodes discussed at the beginning of this section
The hashed sharing strategy described above is rather cheap, and it allows
programming techniques such as the automatic dynamic programming of Section
15.4. The results of this strategy are unsatisfyingly sensitive, however, to the phy-
sical structure of the reduction sequence, and the order of evaluation.
Example 18.4.1
Consider the equations
Sa) = b;
c™ a.
In the expression g(f(c), f(a)), c reduces to a, yielding g(f(a), f(a)). We
might expect the two instances of f(a) to be shared, but the hashed sharing stra-
tegy will not accomplish this. Both nodes containing f existed in the original
expression, and had different son pointers at that time, so that there could be no
sharing. When c is replaced by a, the a occupies a new node, which is hashed,
discovering the existing a node, and creating a sharing of that leaf. The change in
the son pointer in the leftmost f node, however, does not cause that node to be
18.4. Performing a Reduction 217
hashed again, so the fact that it may now be shared with another f node is not
discovered. Thus, two more reduction steps are required to reach the normal form
g(b, b), instead of the one that would suffice with sharing of f (c).
0
Only one reduction step is wasted by the failure of sharing in Example 18.4.1, but
arbitrarily many reduction steps might be involved from f (a) to the normal form b
in a more complex example, and all of those steps would be duplicated.
In order to achieve maximal sharing, we need to rehash a node each time any
of its sons changes. This requires a more sophisticated use of hashing tables,
allowing deletions as well as insertions, and the details have never been worked out.
Perhaps some technique other than hashing should be applied in this case. Notice
that, by allowing reduced terms to remain in memory, rather than actually replac-
ing them, sharing may be accomplished between subexpressions that never exist
simultaneously in any intermediate expression in a reduction sequence. Thus, no
evaluation will ever be repeated. Such a complete sharing strategy would accom-
plish an even stronger form of automatic dynamic programming than that
developed in Section 15.4, in which the most naive recursive equations would be
applied efficiently, as long as they did not require many different expressions to be
evaluated. Of course, there is a substantial space cost for such completeness. In
practice, a certain amount of space could be allocated, and currently unused nodes
could be reclaimed only when space was exhausted, providing a graceful degrada-
tion from complete sharing to limited amounts of reevaluation.
The sketch of a complete sharing strategy given above resembles, at least
superficially, the directed congruence closure method of Chew [Ch80]. In
congruence closure [NO80], equations are processed by producing a congruence
218 18. Crucial Algorithms
graph, each node in the graph representing an expression or subexpression. As
well as the father-son edges defining the tree structure of the expressions, the
congruence graph contains undirected edges between expressions that are known to
be equal. Initially, edges are placed between the left- and right-hand sides of equa-
tions, then additional edges are added as follows:
1. (Transitivity) If there are edges from a to 6 and from 8 to y, add an edge
from a@ to y.
2. (Congruence) If there are edges from a; to 6; for all i from 1 to n, add an
edge from f (a,,°°+ ,2) to f(8,, °°: ,B,).
A carefully designed algorithm for congruence closure [NO80, DST80] may be
quite efficient for equations without variables, but must be modified for equations
with variables, since there is no bound on the size of the graph that must be
treated. The directed congruence closure method of Chew uses directed edges to
indicate reductions, and adds nodes to the congruence graph only as they are
needed to make progress toward a normal form. Every time a reduction edge is
added, congruence closure is applied to the whole graph. Directed congruence clo-
sure was shown by Chew to avoid all reevaluation of the same expression. We con-
jecture that the rehashing strategy for dynamic sharing, sketched above, is essen-
tially an optimization of the directed congruence closure method, in which closure
is applied only to portions of the graph that turn out to be needed for progress
toward a normal form. A careful implementation of some form of congruence clo-
sure would be an extremely valuable option in a future version of the equation
interpreter. For cases where repeated evaluation is not expected to arise anyway,
its overhead should be avoided by applying a less sophisticated sharing strategy.
18.4. Performing a Reduction 219
Independently of the various issues discussed above, the equation interpreter
needs a way to reclaim computer memory that was allocated to a subexpression
that is no longer relevant. We tried the two well-known strategies for reclaiming
memory: garbage collection and reference counting. The first implementations
used a garbage collector. That is, whenever a new node must be allocated, and
there is no free space available, the whole expression memory is traversed, detect-
ing disconnected nodes. Garbage collection has the advantage of no significant
space overhead, and no time wasted unless all storage is actually used. Unfor-
tunately, as soon as we ran large inputs to the interpreter, garbage collection
became unacceptably costly. Typical garbage collections would only free up a
small number of nodes, leading to another garbage collection with rather little
reduction in between. In fact, it was usually faster to kill a computation, recompile
the interpreter with a larger memory allotment, and start the program over, than to
wait for a space-bound program to finish. Based on this experience we chose the
reference count strategy, in which each node contains a count of the pointers to it.
When that count reaches zero, the node is instantly placed on a free list. Refer-
ence counting has a nontrivial space overhead, and adds a small time cost to each
creation and deletion of a node. Unlike garbage collection, it does not cause all of
the active nodes to be inspected in order to reclaim inactive ones.
19. Toward a Universal Equational Machine Language
While the equation interpreter project has attempted to provide an efficient imple-
mentation for the widest possible class of equational programs, other researchers
have sought a fixed set of primitive functions defined by equations as a universal
programming language. Pure LISP [McC60] may be viewed as a particular equa-
tional program, defining a general purpose programming language interpreter.
More recently, Backus [Ba78] has defined a Functional Programming language by
a fixed set of equations. Turner [Tu79] suggests the Combinator Calculus as a
universal language, into which all others may be compiled. There are a number of
attractions to a fixed, universal, equationally defined programming language:
1. The designer of such a language may choose primitives that encourage a par-
ticular programming style.
2. A well-chosen set of equations might be implemented by special-purpose tech-
niques more efficient than the more general techniques used in the equation
interpreter.
3. A particular equationally defined programming language might provide the
machine language for a highly parallel computer.
While agreeing with the motivation for choosing a fixed set of equations, we believe
that the criteria for such a choice are not well enough understood to allow it to be
made on rational grounds. Aside from the subjective issues of convenience, and the
technology-dependent issues of efficiency, there is no known method for establishing
the theoretical sufficiency of a particular equational language to simulate all others.
In this section, we investigate the theoretical foundations for universal equational
programs, and produce evidence that the Combinator Calculus is not an appropri-
ate choice. Since FP, SASL, and many similar proposed languages compile into
19. Toward a Universal Language 221
combinators, they are also insufficient. Unfortunately, we have not found an
acceptable candidate to propose in its place, but we can characterize some of the
missing qualities that must be added.
First, consider the more usual procedural languages and their sequential
machines. Accept for the moment the architectural schema of a Random Access
Machine, with an unbounded memory accessed by a central processor capable of
performing some finite set of primitive operations between finite sequences of stored
values. Each choice of primitive-operation set determines a programming language
and a machine capable of executing it directly. In order to build a general-purpose
machine, we usually choose a set of primitive operations that is universal in the
sense that every other finite set of computable operations may be compiled into the
universal one. In theory textbooks, we often state only the result that some univer-
sal set of operations is sufficient to compute all of the computable functions. In
fact, we usually expect, in addition, that compiling one operation set into another
has low complexity, and that the compiled program not only produces the same
result as the source program, but does so by an analogous computation. The low
complexity of the compilation is not usually stated formally, but the analogousness
of the computation is often formalized as stepwise simulation.
A reasonable-seeming candidate for a universal reduction language is the
S—K Combinator Calculus, a language with the two nullary symbols § and K,
plus the binary symbol AP for application. For brevity, AP (a,8) is written a8 and
parentheses are to the left unless explicitly given. Reduction in the combinator cal-
culus is defined by
222 19. Toward a Universal Language
KaB—a
SaBy — ay(By)
The Combinator Calculus is well-known to be capable of defining all of the com-
putable functions [Ch41, CF58, St72], and has been proposed as a machine
language [Tu79, CG80]. Certain computations, however, apparently cannot be
‘simulated by this calculus.
Consider a language containing the Boolean symbols T and F, and the paral-
lel or combinator D, with the rules
DTa—T
DaT —T
DFF — F
Intuitively, in order to evaluate DaB we must evaluate a and @ in parallel, in case
one of them comes out 7 while the other is undefined. On the other hand, it is
possible to evaluate combinatory S—K expressions in a purely sequential fashion,
by leftmost-outermost evaluation [CF58]. Thus, the only way to simulate the D
combinator in the S—K calculus seems to be to program what is essentially an
operating system, simulating parallelism by time-sliced multiprogramming. Such a
simulation appears to destroy the possibility of exploiting the parallelism in D, and
can hardly be said to produce an analogous computation to the original.
This section formalizes the concept of simulation of one reduction system by
another, and studies the powers of the S—K combinator calculus and its extensions
by the parallel or (D) and arbitrary choice (A) operators. Section 19.1 defines
reduction systems in a natural and very general way, and defines the confluence
(Church-Rosser) property that holds for certain reduction systems. Section 19.2
develops useful properties of the combinator calculi. Section 19.3 defines simula-
tion of one reduction system by another, gives examples of plausible simulations,
19. Toward a Universal Language 223
and shows that a weaker definition allows intuitively unacceptable "simulations."
Section 19.4 shows that the S—K calculus does not simulate the S—K—D calculus,
and that the S—K—A calculus is universal. Section 19.5 shows that the S—K cal-
culus simulates all simply strongly sequential systems. Section 19.6 shows that the
S—K—D calculus simulates all regular systems.
19.1. Reduction Systems
The equational programs discussed in this book are viewed through the formalism
of term reduction systems, presented in Section 17.1. The theoretical foundations
for studying simulations of reduction systems seem to require a more general
framework, where the states of a computation do not necessarily take the form of
terms. Reduction systems are a more general class of formal computational struc-
tures. The essence of a reduction system is a set of possible states of computation,
and a relation that determines the possible transitions from one state to the next.
States with no possible transitions are called normal forms, and represent situations
in which the computation halts. There is no loss of generality in assuming that, in
any state with a possible transition, some transition is taken.
Definition 19.1.1
A reduction system is a pair <S, —>, where
S is a set of states
—CSxS is a binary relation on S, called reduction.
In most cases, we refer to the system <S, —> as S, and use the same — symbol
in different contexts to indicate the reduction relation of different systems.
Such a reduction system is effective if
1. S is decidable (without loss of generality, S may be the nonnegative integers),
2. (8| a6} is finite for all a, and the branching function n(a)=|{6| «—6}| is
224 : 19. Toward a Universal Language
total computable,
3. the transition function t:S—P(S) defined by t(a)={8| «—6) is total comput-
able.
0
Intuitively, a 8 means that a computation may go from @ to 6 in one step.
Definition 19.1.2
Let <S, —> bea reduction system.
n€S is a normal form if there is no y such that 7 —v.
Ng = {n€S| 1 is a normal form)
0
Definition 19.1.3
A reduction system <S, —> is confluent if
Va,B,vES (a>*8 & a*y) => FsES G—*5 & y—*8)
(See Figure 19.1.1)
0
The confluence property is often called the Church—Rosser property, since Church
and Rosser established a similar property in the A calculus. The confluence pro-
perty is important because it guarantees that normal forms are unique, and that
normal forms may be found by following the — relation in the forward direction
only. For example, consider a reduction system with states a, 8, y, 6), 6)°°~°, and
reduction relation defined by a8, a—y, a—*5), 5; 6;4;. 8 and + are the only
normal forms. See Figure 19.1.2 for a picture of this reduction system. Because of
the failure of the confluence property in this reduction system, a has two different
normal forms, 6 and y. Furthermore, 6, cannot be reduced to normal form, even
though it is equivalent to the normal forms 8 and y according to the natural
19.1. Reduction Systems 225
Figure 19.1.1
Figure 19.1.2
226 19. Toward a Universal Language
equivalence relation generated by —. In order to find 6 or y from 6), we must
take a reverse reduction to a.
19.2. The Combinator Calculus, With Variants
The S—K combinator calculus [Sc24, CF58, St72] was developed by logicians to
demonstrate that the concept of a variable in mathematics may be eliminated in
favor of more primitive concepts. More recently, Turner [Tu79, CG80] has pro-
posed this calculus as a machine language, into which higher level languages may
be compiled. Viewed as a reduction system, the combinator calculus is defined as
follows. The particular reduction relation defined below is sometimes called weak
reduction [St72]. Strong reduction mimics the d-calculus more closely, but
requires an extended set of terms.
Definition 19.2.1
The S—K calculus is the reduction system <C[S,K], —->, where
CIS,K] = (S,K,AP}y is the set of all terms built from the constants S and K, and
the binary operation AP (as mentioned before, the AP operation is abbreviated by
juxtaposition);
— is the least relation satisfying
KaB —a
SaBy —ay(By)
a—B => ay —By& ya —yB
for all «,B,yEC.
The S—K—D calculus is the reduction system <C[S,K,D], —> obtained from
the S—K calculus by adding the constant symbol D and augmenting the relation
— with the rules
DKa—K
19.2. The Combinator Calculus 227
DaK —K
D(K (SKK)) (K (SKK)) -K (SKK)
(K represents truth and K(SKK) represents falsehood)
The S-K—A calculus is the reduction system <C[S,K,A], —> obtained from
the S—K calculus by adding the constant symbol A and augmenting the relation
— with the rules
Aap —a
Aa —B
0
The S—K calculus is definable in the current version of the equation interpreter
(see Section 9.9), and the S—K—D calculus will be handled by a future version.
The S—K—A calculus is unlikely to be supported, because of its inherent indeter-
minacy. See Section 15.2 for a discussion of the difficulties in dealing with indeter-
minate constructs.
Conventional presentations of combinators usually include the additional sym-
bol J, with the rule Ja—a. For symbolic parsimony, we omit the J, since its effect
may be achieved by SKK, as SKKa—Ka(Ka) —a. The following properties of
the S—K calculus are well-known [St72], and clearly hold for S-K-D and S-K-A as
well.
Lemma 19.2.1
Let J=SKK.
Let @ be a term built from S,K,D,A, the variables x,y,z,:-*+, and the binary
operation AP (variables are not allowed in CIS,X], CIS,K,D], CIS,K,A] as
defined above).
Let al8/x] be the result of replacing each occurrence of the variable x in a by 8.
228 19. Toward a Universal Language
Let Ax.a be defined by
AxXxXEI
Ax.y=Ky for x# y
A\xX.S=KS
\x.K=KK
-Ax.D=KD
d\x.A=KA
dx. (8) =S (rx.c) (Ax.8)
Then, (Ax.a)B ~*al8/x] for all a€CIS,K,D]UCIS,K,A].
0
Notice that the definitions above translate all A terms into combinatory terms with
variables, and all \ terms with no free variables into CLS ,K].
Lemma 19.2.2 [Ch41, St72]
Let ¢ be any acceptable indexing of the partial recursive functions.
There is a total computable function ~ from the nonnegative integers (N) to the
normal forms of C[S,K] (Ncjs,x)) and a term v€Nc{s_x}, such that
vi j 7*96,Q) for all i,j EN.
In particular, the function ~ may always be defined by
T=)dx.Ay.x (x(- ++ (xy) +++ )), where the number of occurrences of the variable x
applied to y is i [Ch41].
0
Lemma 19.2.3 [CF58, K180al
There is a term »€CIS,K] that implements the least fixpoint function. That is,
pa —*a(ue) for all w€CLS,K].
# may be used to construct a, °° ,@,,€CLIS,K] solving any simultaneous recursive
19.2. The Combinator Calculus 229
definitions of the form
oN *
oxy Xu, By
Om Xm” Xr —* Bn
where each B; is a term built from aj, °° ny rXy py Xj4S,K,D,A and AP.
Specifically, p= Ox. Aff Cxexf)) Ax. Aff (xxf))
0
Lemma 19.2.1 allows us to define procedures that substitute parameters into terms
‘
with variables. Lemma 19.2.2 guarantees the existence of terms to compute arbi-
trary computable functions on integers, saving us the trouble of constructing them
explicitly. Lemma 19.2.3 lets us construct terms that perform arbitrary rearrange-
ments of their arguments and themselves, even though those arguments may not be
integers. That is, we may write explicitly recursive programs.
Although arbitrary data structures, such as lists, may in principle be encoded
as integers, and all computable operations may be carried out on such encodings,
the computation steps involved in manipulations of encoded data structures may
not correspond correctly to the intended ones. So, we define structuring operations
in a new, but straightforward, way.
Definition 19.2.2
T=K
F=kKI
P=)x.dy.dz.zxy
L=)dx.xT
R=)x.xF
C=)Ax.Ay.rz.x (yz)
230 19. Toward a Universal Language
M=)x.PxT
<a,B>=dz.zaB8
if « then B else y=aBy
0
T and F represent the usual truth values. Pa®, or <a,6>, represents the ordered
pair of a and 8. L and R are the left- and right-projection operations, respec-
tively. Pafy represents the conditional term that gives a when y is T, and 8 when
y is F. These observations are formalized by the following straightforward lemma.
Lemma 19.2.4
For all a,6,y€CIS,K,D]UCIS,K,A):
Pap —* <a,8>
L<af>—*a
R<a> —*B
PoaBy —*if y then a else B
if T then a else B—*a
if F then a else B—*B
CaBy —*a(By)
Ma—* <a,T>
0
The more conventional pairing and projection operators defined on numerical
encodings satisfy similar properties for those a,8,y that actually represent integers.
This restriction is particularly troublesome, since all integer representations are in
normal form. Thus, integer-encoded operators are strict, while L(Pa) as defined
above reduces to a even if 8 has no normal form.
19.2. The Combinator Calculus 231
Pairing functions may be used to define lists. In order to be able to test a list
for emptiness, we pair up every element with a Boolean value indicating whether
the end of list has been reached. This is necessary because the property of being
an ordered pair, in the sense of < >, is not testable within the calculus.
Definition 19.2.3
[ J] abbreviates <F,F>
Lay, 0, °° * sa]
abbreviates <T,<a,,<T,<ay,°*' <T,<a,,<F,F>>>°'+*>>>>, forn21.
0
All of the usual list operators may be defined by terms in C[S,K] in such a way
that reduction to normal form produces the same result as a "lazy" or outermost
LISP program [FW76, HM76, O’D77].
Lemma 19.2.5
The S—K and S—K—D combinator calculi are confluent, but the S-K—A cal-
culus is not.
Proof sketch:
See [CF41, St72] for the S—K calculus. The general results of [0’D77, K180a]
cover S—K—D. ATF —T and ATF —F, disproving confluence for S-K—A.
0
Reduction systems, in general, may have no meaningful ways of identifying
different reduction steps, other than by the states that they connect. When the
states are terms, as in the three combinatory calculi, and when the definition of the
arrow relation is given by rules allowing the replacement of certain subterms, it is
natural and useful to identify reduction steps with the occurrences of subterms that
they replace.
232 19. Toward a Universal Language
Definition 19.2.4
In any of the three reduction systems defined in this section, a redex is an
occurrence of a subterm that may be replaced by the — relation.
When a8, a residual of a redex r in @ is a redex in @ directly resulting from r.
0
For example, in the S—K calculus, a redex is an occurrence of a subterm in one of
the forms KaB, Say. In a reduction of the form alS 665] —alB5(75)], the only
residual of a redex within @ in the leftmost expression, is the redex in exactly the
same position in the rightmost expression. The only residual of a redex within 6 or
¥ is the redex in the corresponding position in the explicit copy of 6 or -y, and the
two residuals of a redex within 6 are the two redexes in corresponding positions in
the two explicit copies of 5. The redex S#y6 has no residual in this case, because it
is destroyed by the reduction. Residuals generalize naturally to arbitrarily long
sequences of reductions. When it is necessary to trace residuals through reduction
sequences, we will write a 7B to indicate that a reduces to 8 by replacing r or one
of its residuals in a. For a more precise treatment of residuals, see [HL79,
O’D77]. For the purposes of this section, the intuitive treatment above should
suffice.
Huet and Lévy also define sequentiality for reduction systems such as S—K
and S—K—D, but their definition depends on the term structure of the states in
these systems. See Section 16 for a discussion of sequentiality in term rewriting
systems. We will isolate one consequence of the sequentiality of S—K, and nonse-
quentiality of S-K—D, that may be expressed purely in terms of the reduction
graphs.
19.2. The Combinator Calculus 233
Definition 19.2.5
A reduction system <S,—> has property A if there is a function f:N —-N such
that the following holds for all «,6,7,5€S:
if a is a (not necessarily unique) least common ancestor of 8,y in the graph of —,
and 8 "5, y —6, then there is a reduction sequence a—~*6§ with k<f (m,n).
0
Intuitively, property A says that the upper reductions shown in Figure 19.2.1 can-
not be too much longer than the lower ones.
a
Figure 19.2.1
In order to establish property A for the S-K calculus, we need a way of choos-
ing a standard reduction sequence for a particular pair of terms.
Definition 19.2.6 [CF58]
The reduction sequence ay ro! ptt p77 Om is standard if, whenever rj4; is a
1
r,
residual of a redex s in a, r; either contains s as a subterm, or is disjoint from s
and to the left of it. In other words, redexes are reduced starting with the
234 19. Toward a Universal Language
leftmost-outermost one, and any left-outer redex that is skipped in favor of a right
or inner one will never be reduced.
0
The following lemma is from Curry and Feys [CF58].
Lemma 19.2.8
ty the S—K calculus, if a—’8, then there is a standard reduction of a to @ with at
most 2! steps.
0
Lemma 19.2.9
The S—K—D calculus does not have property A.
The S—K calculus has property A, with the function f (i,j) =2'+2/.
Proof sketch:
The S—K—D calculus contains subgraphs of the form described in Definition
19.2.5, with m=n=1, but k arbitrarily large. Let Jp=/, 1,4,=1;J. Notice that
T;4; 71, and there are no other reductions possible on J;,,. Let a=D(I;T)(,T),
B=DTU;T), y=DU;T)T, 5=T. B—'5, and y—'4, but a—/*76 is the shortest
reduction sequence from a to 6.
For the S—K calculus, let a,6,7,5€CIS,K],m,n€N be as in the statement of
Definition 19.2.5. Since a is a least common ancestor of 8,y, the two reduction
sequences a—*8 and a—*+y have no steps in common (i.e., reducing residuals of
the same redex). Let a—~*8, a—~*y, a—*5, B38, y—-"'5 be the standard
reductions. By Lemma 19.2.8, m'<2”, and n'<2".
Consider a redex r that is reduced in a—*@, but not in ys. Since r is not
reduced in a—*-, it must be eliminated in a—*y—”"6 by an application of the
rule Kix —%, with r in x. In the standard reduction a —*6, the K-reduction above
19.2. The Combinator Calculus 235
comes before the reduction of r, so r is not reduced in a—*é. Thus, every redex
that is reduced in a—*, and in a—*6 must also be reduced in y—"'6. A sym-
metric argument covers a—*y and 8 ns. Every redex reduced in a —*§ comes
from cither a—*8 or a—*y, sok <m'+n'<2"+2",
0
19.3. Simulation of One Reduction System by Another
In the introduction to Section 19 we argued that, although every function com-
puted in the S—K—D calculus may be computed in the S—K calculus, there are
certain computations, with a parallel flavor, that can be produced by S—K—D but
not by S—K. In this subsection, we propose a definition of simulation for reduction
systems, that seems to capture the essential elements of simulations that preserve
the general structure of a computation. As in the definition of stepwise simulation
for conventional random access machines, we associate with each state in a guest
system one or more states in a host system, which will represent the guest state.
The association of guest computation steps with host computation steps is trickier.
It is not appropriate to insist that every guest computation step be associated with
a single contiguous path in the host system, since potentially parallel guest steps,
when interpreted as multistep paths in the host, could well have their individual
steps interleaved. On the other hand, it is not enough merely to require that « —~8
in the guest if, and only if, a'—*@' for associated states in the host, since that
requirement still allows pathological behaviors in the host that do not correspond to
any such behavior in the guest. For example, if there is a large, simple cycle
Q, a2 *** a, a, in the guest, the host would be allowed to have spurious
reduction paths directly between o,' and a;', without involving the appropriate
intermediate steps. The host could even have infinite reductions within equivalence
236 19. Toward a Universal Language
classes of states representing a single host state.
The following definition of simulation is not clearly the right one, but it is at
least a very plausible one, and addresses the concerns described above. The posi-
tive and negative results about simulations in Sections 19.4, 19.5 and 19.6 provide
evidence that the definition is reasonable, since they agree with a programming
intuition about what attempts at simulation are and are not acceptable. The intent
of the definition is to capture the idea that the set of possible choices in the host
system must be exactly the same as the set of possible choices in the guest system,
when the differences between different host representations of the same guest state
are ignored. We do not require, however, that decisions in the host be made in the
same order that they are made in the guest. Choices of different host representa-
tions for the same guest state may predetermine future choices between different
guest states. Roughly speaking, the host may be allowed to, but must not be
required to, plan ahead in a computation sequence. Invisible book-keeping steps in
the host are allowed, which do not change the represented guest state, but such
steps are not allowed to grow without bound, else they could surreptitiously simu-
late potential guest computation steps that have not been officially chosen for exe-
cution in the host computation.
Definition 19.3.1
Let <S,, —,> (the guest) and <S,, —;,> (the host) be reduction systems.
S; weakly simulates S, if there exist
an encoding set ECS,,
a decoding function d:S, —~-S,U{nil} (nil ¢ S,)
a computation relation +, —,
such that
19.3. Simulation 237
1. dlEl=S, & d“'INs INECNs,
2. Va,BeS, a, => d(a) >,d (8)
3. Va,BeS, al, — —,)8 => d(a)=d (6)
4.
V a€E,BES, d(a) +8 => FsEE al, — —.)* (+, — =,)*5 & dO)=B
5. Wa€Ng, d(a) €Ng, U {nil}
6. There is no infinite —, — —, path
<S;,, —,> simulates <S,, 7,> if, in addition, there exists a
g
bound function b:S, —-N such that
6". VaBeE al, — 2) —,(—, — 7,)"8 => m<b(d(a)) & n<b(d(B))
A (weak) simulation is effective if the appropriate E,d, —,, and 5 are all total
computable.
0
Intuitively, (1) requires that d maps E onto S,, respecting normal forms.
d(a)€S, is the unique expression encoded by a€S,, but each BES, may have
infinitely many encodings. The allowance for multiple encodings corresponds with
common practice, where, for example, a stack is encoded by any one of many
arrays containing the contents of the stack, plus extra meaningless values in unused
components of the array. (2), (3) and (4) require that each —, reduction is simu-
lated by any number of —, — —, reductions, which do not change the encoded
expression, followed by exactly one — reduction to effect the change in the
encoded expression. (5) prevents dead ends in the simulation.
Notice that the effect of (1)-(4) is to select a subset E of S, to represent S,,
and divide it into equivalence classes in a one-one correspondence to S,. A state
238 19. Toward a Universal Language
a€S, — E that is accessible by reductions from a state in E must also be associ-
ated with a unique d(@)€S,. Each one-step reduction on S, is simulated by one
or more reduction sequences between the equivalence classes in E. There may be
—,, reduction sequences that slip between the various classes d~'[@]NE, but they
still mimic —, reductions by their behavior on the classes d~[g]. Alternatively,
“think of d7'[B]ME as the set of canonical encodings of 8 from which any oor
reduction may yet be chosen. Members of d~'{8] — E still represent 8, but may
require some additional reductions to display 8 canonically, or may predetermine
some restrictions on the se reductions of @.
The relation —*, could be always taken as the restriction of —), to
U d~[a]lxd='[g], except for the necessity of representing self-loops of the form
ap
ans.
gs (6) prevents infinite reductions in —), that do not accomplish anything
with respect to —,. (1)-(6) together allow us to find a normal form for w€S, by
encoding it ina B€ENd™[al, reducing 8 to a normal form +, then taking d(y) for
the normal form of a. (6') strengthens (6) to require that the maximum length of
possible —, — —*, reductions to w€S, is a function only of the encoded expres-
sion d(qa), not of the particular encoding a. Restriction 6’ enforces the intuitive
rule that invisible book-keeping steps in the host computation must not be so com-
plex that they actually simulate potential guest computation steps that the host
chose not to perform.
Notice that Definition 19.3.1 really has to do with simulating a certain degree
of nondeterminism, rather than parallelism. Simulating all possible degrees of
nondeterminism appears to be necessary for simulating all degrees of parallelism,
but is certainly not sufficient. It is not clear to us how to capture degree of paral-
19.3. Simulation 239
Ielism precisely at an appropriate level of abstraction.
The following lemma is straightforward, but tedious, to prove.
Lemma 19.3.1
(Effective) weak simulation and simulation are reflexive and transitive relations on
reduction systems.
0
While the bounding restriction (6') might seem excessive, there are certain
weak simulations that are intuitively unacceptable.
Theorem 19.3.1
The S—K calculus effectively weakly simulates the S—K—D calculus.
Proof sketch: .
The basic idea is to encode a term Daf by an S—K term of the form p<ix>. t
and « are programs producing possibly infinite lists of static data structures
representing the possible reductions of a and 6 respectively. p is a program using
"lazy" evaluation to alternately probe one step farther in « and x respectively, until a
T is found in one or the other, or an F is found in each. When such Boolean
values are found, p throws away the lists « and x, and produces the appropriate
Boolean value. The decoding function maps p<.,x> to Daf, where a and B are
the last items actually appearing on the lists 1 and x respectively, as long as no T
appears. As soon as a T appears, followed by nil (encoded as <F,F>) to mark
the end of list, the decoding function maps to T, even though the program p has
not yet discovered the T.
0
The weak simulation outlined above is intuitively unsatisfying, because it really
simulates the parallel or behavior of Da@ by an explicit and rigid alternation of
240 19. Toward a Universal Language
steps on w and @. Although at first the programs « and x may proceed completely
independently, the behavior of p forces the first one of them to reach T to wait for
the other one to make at least as many steps before the normal form can be
reached. All of this catching up is hidden within the equivalence class encoding T.
In consequence, arbitrarily long sequences of reductions may go on entirely within
this equivalence class. It is precisely such arbitrarily long reductions within an
equivalence class that are ruled out by (6’) in Definition 19.3.1.
It is useful to apply a geometric intuition to reduction systems by treating
them as (usually infinite) directed graphs whose nodes are the states, and whose
edges show the reduction relation. In the weak simulation of Theorem 19.3.1, con-
sider a term Daf, where a and B each reduce by a unique path to T. The graph
representing the part of the S—K—D calculus below Def is suggested in Figure
19.3.1. Reductions down and to the left indicate those applying to a, and those
down and to the right apply to 8. The terms along the lower left edge are all of
the form DTy, where 8 —*+, and those on the lower right are of the form D6T,
where a—*6.
Figure 19.3.2 shows the part of the S—K graph below the encoding p<.,«> of
Daf. In this case, terms that show the same reductions to 1,x, but different
amounts of reductions involving p, are gathered into one blob. Reductions to the
left indicate reductions to 1, those to the right to x. Reductions involving p are hid-
den in the blobs. The lower left edge contains terms of the form p<(--- T),y>,
where x —*-y, the lower right contains terms of the form p<y,(--- 7)>, where
t—-*y. The dotted lines surround blobs representing JT. Notice how arbitrarily
long paths arise along the lower edges within the region representing 7. These
long paths violate (6').
cx
Som
AX aw
OY
ee
or
(
ss - SK T)) Fa
we
\
~N
Figure 19.3.3
244 19. Toward a Universal Language
For comparison purposes, Figure 19.3.3 shows the part of the S—K graph
below a term of the form ifa then T else B, where a8 each reduce to T. This
form is often called the conditional or of a and 8. The lower left edge contains
terms of the form if T then T else y, and the lower right edge contains terms of
the form if y then T else T.
Definition 19.3.2
An effective reduction system is universal if it effectively simulates all effective
reduction systems.
0
19.4. The Relative Powers of S—K, S—K—D, and S—K—A
Theorem 19.4.1
The S—K calculus does not simulate the S—K—D calculus.
Proof sketch:
Suppose the contrary, and let b be the bound function in the simulation. Consider
the S—K—D terms a=D (IT) IT), B=DT ImT), y=D UI mT)T, 5=T.
Notice that @ is a least common ancestor of 6 and +, and the shortest reduction of
a to 6 is of length 2°+141,. Choose S—K terms a'€d[al, p'€d—[B]NE,
B"ed—"Ie], y'€d“ly]NE, y"ed"Ty], s'€d“"[5]NE, such that a'—*p'—*B",
a’! —*y'—*-"" a! is a least common ancestor of B",y", and B",y" cannot be reduced
further within d~'[g], d~'[y] (see Figure 19.4.1). The existence of such terms is
guaranteed by 3 and 4 of Definition 19.3.1, and by the confluence property. The
shortest reduction of a’ to 6’ must be of length at least 2°°+!+41. So, by Lemma
19.2.9, BY ™3', y" —"5', with 2° +141<2"42". At least one of m,n must be
>b(T), contradicting restriction 6' of Definition 19.3.1
0
19.4. Relative Powers 245
Figure 19.4.1
Theorem 19.4.2
The S—K—D calculus does not simulate the S—K—A calculus.
Proof sketch:
The proof is elementary, since no confluent system may simulate a system that is
not confluent.
0
246 19. Toward a Universal Language
Theorem 19.4.3
The S—K-—A calculus is universal.
Proof sketch:
Let <N,—> be an arbitrary effective reduction system, with nonnegative integers
for states (no loss of generality comes from the use of nonnegative integers). The
function ~:N -CIS,K] is the encoding of integers into combinatory terms from
Lemma 19.2.2. Let &€CIS.K] be an equality test for encoded integers. Let
n€CIS,K] be a combinatory term for a program that takes an argument a,
representing a€N, and computes the encoded integer nla), where
n(a)=|(G€N| a—f)}|. Let p€ClS,K] be a combinatory term for a program that
takes an argument a, and an encoded integer 7, and computes the encoded state B
BEN that results from performing the ith possible reduction to a. That is,
VijeNn gj jer & ixjPw yr,
VatS na—* In (I,
and
VaeSieN pat —*B,;,
where a —8,, is the ith reduction from a. Such 7,p exist by Lemma 19.2.2. Let
K°=], K''=)x.K (Kix), A°=I, A't=)x.A(K'x)A!. By Lemmas 19.2.2,
19.2.3, there is an e€CIS,K,A] such that
a —*if E(na)0 then a else (pa(A"™ 01-+- Ata)))
Now, we may encode each a€WN as :a€E if @ is not in normal form, as @€E if a@ is
in normal form. Let the nonnil values of d be determined by the following rule:
if ce —*B by a reduction sequence involving only A redexes and redexes in
19.4. Relative Powers 247
a —*if E(na)0 then a else (pa(A"™ 0 1--- atad)) —*
if T then @ else (pa(A"™ 0.1 - ++ ala))) ~*a
(when @ is in normal form), or
1a —* if E(na)0 then & else (pa(A"™ 01 +++ aay) +*
if F then @ else (pa(A"™ 01--+ aad)
(when @ is not in normal form), then d(@)=a.
0
Although the S—K—A calculus is technically universal, it is not a good foun-
dation for equational computing. In fact, the universality of S—K—A illustrates
that our definition of simulation captures degree of nondeterminism, rather than
degree of parallelism, since the arbitrary choice operator is intuitively a sequential
but nondeterministic construct. Many, if not most, parallel computations that a
programmer wants to define have uniquely determined final results, in spite of non-
determinism in the computation. The inherently indeterminate behavior of the A
combinator makes it dangerous as a fundamental constructor for determinate com-
putations. The best foundation for equational computing is probably a layered
language, containing a subset of symbols that produce all of the desired deter-
minate computations, and something like the A combinator, to be used in the infre-
quent cases where truly indeterminate behavior is required. There may be other
layers of disciplined behavior that should also be covered by simple sublanguages.
In the next two subsections, we illuminate the behaviors that may be simulated by
the S—K and S—K—D calculi. Section 19.7 develops other systems to simulate
the behavior of the lambda calculus, which apparently cannot be simulated by any
of the combinatory calculi discussed so far.
248 19. Toward a Universal Language
19.5. The S—K Combinator Calculus Simulates All Simply Strongly Sequential
Term Reduction Systems
In order to simulate an arbitrary simply strongly sequential term reduction system
(see Definition 17.2.3) S over 2 in the S—K calculus, the basic idea is to let con-
tiguous portions of an S—K term represent terms in (ZU {w})y. Initially, each
such term is of the form f(w, +++ ,w), with exactly one symbol in 2 and the rest
ws. Whenever one of the represented (ZU (w})y terms becomes stable, it produces
a direct syntactic representation of itself, that is accessible to operations from
above. As long as a represented (ZU {w})y term is a strictly partial redex, it
absorbs the topmost symbol from an index position below it (which can only be
done after that index position becomes stable). When a represented (ZU {w})y
term becomes a complete redex, it produces the associated right hand side of a rule
in S, in the initial form of f (w, «++ ,w) representations.
First, we define the direct syntactic representation of terms in (ZU {w})y.
This representation is essentially the representation of terms by nested lists in LISP
[McC60], composed with the encoding of lists in S—K of Definition 19.2.3.
Definition 19.5.1
The syntactic encoder syn:(2U {w}) » CIS ,K] is defined as follows.
Let a'€(ZUV)y be the result of replacing the occurrences of w in a by x1,xX2,°°°
in order from left to right.
closure, (8) = Ax,.°** AxX,,-B
syn(a) = closure, (syn'(a')), where n is the number of ws in a, and
syn':(DUV) 4 +((S,K,AP} UV) is defined inductively by the following equations:
syn'(x)=x for x€V
syn'(a;) =[T] if p(a;) =0
19.5. The S-K Combinator Calculus 249
syn'(a; (a), °° * ,@p(a,)) = [ T,syn'(a)), - - + syn'(a(q,))]
In the lines above, the expressions [7] and [ 7,syn'(a,),---] indicate the list
encodings of Definition 19.2.3.
0
Notice that syn(a) contains no variables, and is in normal form.
Now, we define the active semantic representation of a term in (ZU (w})y.
We depend on Lemma 19.2.3, which guarantees the ability to solve multiple simul-
taneous recursive definitions in S—K. Only a finite number of (ZU {w))y terms
need be represented, so we do not need an infinitary mutual recursion.
Definition 19.5.2
Let P = {a€(ZU (w})y| a is a partial redex,
or a is stable and V B<« 8 is a partial redex)
P is a finite set.
Let Y€CIS,K] implement the selector function for lists, with
Va, "++ am €CIS,K1,1<i<m pila, - > 11 * a.
Let o,,,,€CIS,K] implement a spreading function, such that
Va,By,°-* Bm €CIS,KIjEN
Om,i J 0B, ~*~ Bm —* 0B; * + By (Y 2B) «+» WY pla +l B))
= Bist °° Bs
The semantic encoders sem:Xy —CI[S,K] and sem’:P-C[S,K] are defined by
simultaneous recursion as follows:
sem'(a) —*syn(q) if « is stable
sem'(a) —*closure,, (sem(8)) if a=ca'lw/x),-- - ,w/x,,] and a' ~B € S.
250 19. Toward a Universal Language
sem'(a) x4 °° * Xmq —* Om, Y 1 x;) (WY T x,) (sem'(a;) «++ sem'(a,)))x, °° Xm
where a is a partial redex but not a redex, the sequencing function for S chooses
the index a’ for a, there are i-1 ws to the left of the variable in a’, and
Qa;
1, = a'la;(a, ++ w)/x;].
sem(x) = x where x€V
sem(a;) = sem'(a,) if p(a;)=0
sem(a; (ay, ** > ,@p(q,))) = sem'(a;(w, - + + ,w))sem(a,) - - + sem(eg(a,))
Lemma 19.2.3 guarantees a solution for each sem(a).
0
The definitions of sem and sem’ above may be understood as producing a set of
communicating sequential processes, which partition among themselves the nodes of
a term a€Zy, and whose communication network represents the tree structure of
the term. Initially, each node of a is in a separate process, that knows only the
symbol at that node. Each process simultaneously tries to gather up enough nodes
to form a redex, or to learn that its nodes are stable. As long as a process
possesses a strictly partial redex, it requests the head node of the unique son pro-
cess specified by the sequencing function given in the definition of simply strongly
sequential. When a process possesses a whole redex, it performs the reduction
associated with that redex. When a process discovers that its nodes are stable, it
halts and produces messages that can be read by its father when the father process
wishes to gather up more nodes.
It is convenient to use a more reduced form of sem(a) as the canonical
representative of a term. The encoding function e gives the canonical representa-
tive.
19.5. The S-K Combinator Calculus 251
Definition 19.5.3
The encoding function e:Xy —CI[S.K] is defined by:
e(alB,/x1,°°* Bm/Xm)) = sem'(alw/x,, +++ ,w/x,,])e(6,) «++ e(6,,) where @ is a
left hand side of a rule, or alw/x), +--+ ,w/x,,] is a partial redex and 6,,°°* Bm
are not strongly stable.
e(alB)/x1,° °° 8m/%m1) = syn(a)le(B,)/x,, +++ ,e(6,,)/x,,] where every node in
a is strongly stable, and 6), ~°~ ,8,, are not strongly stable.
0
e partitions a term (considered in graphical form) into contiguous regions that
form maximal partial redexes, and the intervening stable regions. Each partial
redex is represented by the semantic encoding, and each stable region is
represented by the syntactic encoding. The nonoverlapping property of regular
reduction systems (Definition 17.1.3) guarantees that the partition, hence e, is
well-defined.
Lemma 19.5.1
1) syn, sem, and e are one-to-one.
2) sem(a) —*e(a)
3) If a—6 € S, and &— is an instance of a8 with a = aly,/X1,°°* *Y¥m/*m);
then
e(a) =sem'(a)e(y,) -- + eCy,,) —*sem(6)[e(y,)/x),°°° €(Ym)/Xm] *e(B)
4) If ais in normal form, then e(a) = syn(q) is also in normal form.
Proof sketch:
All is straightforward except 2. The key fact in showing 2 is that the nonoverlap-
ping property of Definition 17.1.3 guarantees that when a is a left hand side of a
252 19. Toward a Universal Language
rule schema, then every proper subterm of alw/x,,°-- ,w/x,,], other than w, is
strongly stable. Thus, the nonroot nodes of a redex convert to the syntactic encod-
ing, and may be gathered into the appropriate semantic encoding form at the root
of the partial redex.
0
Theorem 19.5.1
The S—K calculus effectively simulates every simply strongly sequential term
reduction system.
Proof:
Let the encoding set E be the range of e. The nonnil values of the decoder d are
defined by the following condition: if «—6,,°++,«—8, are all of the possible
one-step reductions of a, and sem(a) —*y by reducing a subset of the redexes in
the shortest reductions
sem(a) +*e(a) ~*sem(8,), ** + sem(a) —*e(a) —*sem(6,),
not containing all of the redexes in any one of the preceding reductions, then
d(y)=a.
—, = —N(CIS,K]xE).
0
19.6. The S—K—D Combinator Calculus Simulates All Regular Term Reduction
Systems
The basic idea of the simulation of an arbitrary regular term reduction system by
S—K-—D is similar to the simulation of simply strongly sequential systems by S—K
in Section 19.5. Instead of choosing a unique index position to absorb into a par-
tial redex, the simulation tries in parallel to match every redex at every node in the
19.6. The S-K-D Calculus 253
term. The parallelism between different nodes is treated by similar parallelism in
the S—K—D calculus; _ parallelism at a single node uses the D combinator. The
problem is that D merely gives the or of its arguments, it does not tell us which
one came out 7. In some cases, where the left hand sides of rules are unifiable,
the right hand sides must be the same, so it is not important to know which of the
unifiable left hand sides applied. In other cases, where right hand sides are
different, it is crucial to determine which one to use. The solution is to first dis-
cover that some rule applies at a particular node, then test, in sequence, each left
hand side of a rule. In testing a given rule schema, check every node in the left
hand side of that rule schema in parallel (using the obvious simulation of a parallel
and by a parallel or). The regularity of the system guarantees that, given that
some rule applies, the parallel test whether a particular rule applies must halt. The
following simple example illustrates this idea. The left hand sides in this example
come from [HL79].
Example 19.6.1
Consider the rule schemata f (x,a,b) 1, f(b,x,a) +2, f(a,b,x) 73. The sys-
tem defined by these rules is not strongly sequential, because there is no way to
choose which son of an f to work on first. In a simulation of a computation in this
system, suppose the test "does the current node match rule 1 or rule 2 or rule 3?",
carried out with the parallel or, answers T. In order to find out which of the three
rules applies, try the three tests: "is the 2nd son a and the 3rd son 52", "is the Ist
son 6 and the 3rd son a?", "is the Ist son a and the 2nd son b?", sequentially,
using a parallel and in each one. (equivalently, test "is not the 2nd son not a or
the 3rd son not b?", etc.). Since the nonoverlapping property holds, the tests that
do not correspond to the applicable rule must differ from the correct one in some
254 19. Toward a Universal Language
position, so the parallel and will produce an F response, rather than nontermina-
tion.
U0
The simulation of regular systems by S—K—D uses the same syntactic
encoder syn of Definition 19.5.1, but a new semantic encoder.
Definition 19.6.1
Let the regular set S of rule schemata be partitioned into equivalence classes
S,,--+,S,, where a, 6, is equivalent to a, 6, if a, and a are unifiable. By
the definition of regularity, 8, and 6, must be the same.
Let P and y be as in Definition 19.5.2.
In addition, let K°=1, K't!=)x.K (K'x), D9=I, D'*!=)x.D(K'x)D!.
Let N=)x.xFT implement logical negation, and let x,€CIS,K] be a program to
check one symbol in a syntactic encoding against the corresponding position in a
rule schema. That is,
Va€(ZU (a) gi, 7 EN
a agrees with the jth symbol in rule schema i => x, 1 Jj synla) -*T &
a does not agree with the jth symbol in rule schema i => x, 1 J syn(a) -*F
X2 checks a rule schema in parallel, that is,
VaeCls.K]i€N xT a*N(D"(N Gy T 1) ++: (NG TS @)),
where s; is the size of the ith left hand side.
x checks an entire equivalence class of rule schemata in parallel, that is,
VaeCls,KIjEN x J a—*D" (xpi 1a) «+ - (coh)
where i,,°-- i, are the numbers of the rule schemata in the jth equivalence class.
Let a:ClS,K]x(ZUV)y -({S,K,AP}UV)y be a function that builds a syntactic
19.6. The S-K-D Calculus 255
encoding of a term, applying a given combinator to every node. That is,
a(a,x) =x for x€V
a(a,a;(B1,°°* Bm)) = al T,ala,6)),- >: ,aa,6,,)]
Let o, be a program that constructs a right hand side instance of the jth
equivalence class of rule schema from a left hand side instance, with a specified
operator applied to each node in the right hand side. That is,
Vie 1 mde Ly 05 synla)lyy/x 4s + Ym! Xm] a S,B)Ey/% 1° °° Ym/ Xm]
where a — is in the jth equivalence class.
The parallel semantic encoders psem'€C[S,K] and psem:Zy —CIS,k] are defined
by recursion as follows:
psem’ x —*if D(x 1x) +++ (x k x) then
(if x 1 x then a, psem’
else if x 2x then o2 psem’
else if x k x then oy psem')x
else x
where k is the number of equivalence classes of rule schemata in S.
psem(a) = a(psem',a)
0
As before, we define a more reduced encoding than that given by psem.
Definition 19.6.2
The parallel encoder pe:Zy —CIS,K] is defined by:
pe(a;(a,,°°+ ,a,,)) =
psem' [ T,pe(a,), --- ,pe(a,,)1 if a; (a, «+ * ,@,,) is an w-potential redex
a es
256 19. Toward a Universal Language
[T,pe(a,), +++ ,pe(a,,)] if a;(a;, +++ ,o,,) is strongly stable
Lemma 19.6.1
1) psem and pe are one-to-one.
2) psem(a) —* pe(a)
3) Ifa—8 € S, and a6 is an instance of a8 with @ = aly,/x1,°°* Ym/Xm),
then pe(a) —* psem(8)[pe(y,)/x 1, * + * pe(Ym)/Xm] —* pe(8)
4) If ais in normal form, then pe(a) = syn(q) is also in normal form.
Proof sketch: analogous to Lemma 19.5.1.
0
Theorem 19.6.1
The S—K—D combinator calculus effectively simulates every regular term reduc-
tion system.
Proof sketch: analogous to Theorem 19.5.1.
0
19.7. The Power of the Lambda Calculus
Another mathematically natural candidate for a universal equational language is
the Lambda Calculus [Ch41].
Definition 19.7.1
Given an infinite set of nullary symbols V, called variables, MV] = {Ax| x €V}.
Each dx is intended as a one-argument function symbol.
The Lambda Calculus is the reduction system <A|=,—-> where
A = ({4P} UVUAIV]) y is the conventional set of lambda terms. As in the combi-
nator calculus, AP is abbreviated by juxtaposition, and associates to the left.
dx (a) is written Xx.e. An occurrence of a variable x in Ax.a is a bound
19.7. Power of the Lambda Calculus 257
occurrence,
all other occurrences of variables are free.
a= if a may be transformed to 6 by systematic renaming of bound variables
(e.g., Ax.x=Ay.y). In the sequel, a lambda term « denotes the equivalence class
{g| =).
— is defined by
(Ax.c) B ~alB/x]
a B= ay By & ya —-yB
where al8/x] denotes the result of substituting 8 for each free occurrence of x in
a, renaming bound variables as necessary so that free variables remain free.
0
The Lambda Calculus is, a priori, a weaker candidate for a universal equational
machine language than the S—K Combinator Calculus, because a single reduction
step appears to require an unbounded amount of work, depending on the number of
occurrences of the variable being substituted for.
The Lambda Calculus may be compiled into the S—K Combinator Calculus
by the translation ~ defined as follows.
Definition 19.7.2
x=x
dex = 1
xy = Ky, where xy
LaaB = Shx.adx.8
a8 = a8 LT
The translation of Definition 19.7.2 has been proposed as a method for compiling
258 19. Toward a Universal Language
the Lambda Calculus into the more primitive Combinator Calculus, because it
satisfies the desirable property of the well-known Theorem 19.7.1 [St72].
Theorem 19.7.1
Qx.a)8 al B/xl, for all x €V, «BEA.
O
Unfortunately, the translation does not satisfy the stronger property a ~B=>a@ —B.
Example 19.7.1 [St72]
Ax.((Ay.y)z) Ax.z, yet Ax.(Ay.y)z) = S(KI)(Kz), which is in normal form.
Consider the translation of Ax.((Ay.y)z) into combinators, step by step:
ix Oyy)z) = Sdx. Ayy)Axz = SixDAxz = S(KDAxZ = S(KDixz =
S(KI) (Kz)
Compare the derivation of Xx. (Ay.y)z) to the following one of the subexpression
Oyy)z:
Opy)z = Oyy)z = =z
Notice that, by itself, the subexpression (Ay.y)z translated to Jz, which reduces to
z. But, inside the binding Ax, the J and the z are separated into S (KI) (Kz).
Once the latter expression is applied to an argument, the redex corresponding to
the Iz is created, as in
S (KI) (Kz)w -KIw (Kzw) -1I(Kzw) ~Kzw —2z.
0
Example 19.7.1 shows that the translation into combinators enforces outermost
evaluation in some cases, eliminating the possibility of taking an innermost step.
Since, in principle, the two redexes in Ax. ((Ay.y)z) might be reduced concurrently,
the translation into combinators does not provide a simulation of the Lambda Cal-
culus according to Definition 19.3.1. Intuitively, the standard translation into
19.7. Power of the Lambda Calculus 259
combinators seems to be deficient if the translated program is to be executed on
parallel hardware. There are a number of improvements to the simple translation
of Definition 19.7.2, which solve the problem of Example 19.7.1 and similar small
examples. None of the known improvements realizes the full parallelism of the
Lambda Calculus, however. The equational program for the Lambda Calculus
presented in Section 9.10 suffers from a similar loss of parallelism.
We have not been able to prove that the Combinator Calculus cannot simulate
the Lambda Calculus, but we conjecture that it cannot. Klop [KI80b] has demon-
strated some interesting graph-theoretic differences between the A-calculus and the
S-K calculus, but they do not rule out the possibility of a simulation. It is well-
known in combinatory logic that no finite set of equations between combinators can
provide a full simulation of the Lambda Calculus under the standard translation
(or any of the known variants). Furthermore, informal reflection on Example
19.7.1, and similar examples, shows that, in the Lambda Calculus, reduction of an
outer redex may leapfrog an inner redex to substitute for a variable inside it. Nei-
ther the Combinator Calculus, nor any other regular term reduction system, may
display such behavior. The translations allowed by Definition 19.3.1, however,
include ones that completely change the term structure, so this observation does not
lead to a proof.
There is a variation on the Lambda Calculus, similar to that in Section 9.6,
that preserves all of the apparent parallelism while doing only a small, bounded
amount of work in each reduction step [OS84]. The essence of the variation is
given by the first set of equations in that section, before removing overlap. The
difficult part of the variation is the efficient renaming of bound variables to avoid
capture. This variation cannot be programmed in the equation interpreter, because
260 19. Toward a Universal Language
it involves an inherent violation of the nonoverlapping restriction. The Lambda
Calculus has Property A of Definition 19.2.5, so it cannot simulate the parallel or.
The apparent deficiency of combinators resulting in the inability to simulate the
Lambda Calculus is separate, then, from the deficiency with respect to the parallel
or. Having found these two deficiencies, we should be very careful about accepting
‘any reduction system, even the Lambda Calculus plus the parallel or, as sufficient
for parallel computation, without a solid proof.
19.8. Unsolved Problems
The definition of simulation in this section is plausible, but is not precisely the right
one. The results of this section should be taken as a critique of the definition of
simulation, as much as statements about the particular reduction systems studied.
Besides the essential problem of characterizing parallelism, instead of just non-
determinism, the definition of simulation may not be exactly right even for captur-
ing the degree of nondeterminism.
It is disturbing that, although the intuitive difference between the S—K and
S-K-—D calculi has to do with optional sequential or parallel computation versus
required parallelism, the proof that S—K cannot simulate S—K—D hinges on sim-
ple abstract graph-theoretic properties of the two calculi. We had expected the
proof that S—K does not simulate S—K—D to use recursion theory, since the criti-
cal difference between S—K and S—K—D has to do with the existence of a com-
putable function to pick the next required computation step in S—K, and the lack
of any such computable function in S—K—D. In particular, it appears that
effective simulation of S-K—D by S—K should contradict standard recursion-
theoretic results by providing a recursive separation of {i| ¢;(0)=0} and
{i| ¢;(0)=1}. We could construct an S—K—D term of the form Daf, where a
19.8. Unsolved Problems 261
tests $;(0)=0, and 8 tests ¢;(0)=1, simulate it with an S—K term y, and if the
leftmost-outermost reduction of y simulated reductions to a but not 8, we should
be able to conclude that ¢; (0) 1, and vice versa. We have not succeeded in prov-
ing that the leftmost-outermost reductions of y cannot simulate interleaved reduc-
tions to @ and @ in such a case, although such a simulation looks impossible.
Whether or not the definition of simulation is exactly right, the simulations of
simply strongly sequential systems by S—K, and regular systems by S—K—D are
intuitively satisfying, and should be allowed by any reasonable definition. It would
be useful to know other simple reduction systems that simulate all systems in some
natural restricted classes. In particular, the existence of a confluent effective
reduction system that effectively simulates all other confluent effective reduction
systems is an important open question. Also, a strongly sequential reduction sys-
tem that simulates all others of its class would be useful. We conjecture that S—K
does not simulate all strongly sequential systems.
An interesting hierarchy could develop around a natural sequence of more and
more powerful combinators. We conjecture that the S—K—P calculus, using the
positive parallel or with the rules PTa—T, PaT —T, does not simulate the
S—K-—D calculus, nor does the S—K—D simulate the S—K—E calculus with the
equality test defined by Eaa—T, and that the S—K—E calculus does not simu-
late the S—K—F calculus with the rule Faa—a. All of these systems are
confluent [K180a, Ch81]. A classification of the power of combinatory systems
should include the A-calculus and the S—K—C (parallel if) calculus with the rules
CTaB —a, CFaB —B, and CaBB 8, which may be even more powerful than
S-K-F.
20. Implementation of the Equation Interpreter
Implementation of the algorithms, discussed in Section 18, for pattern matching,
sequencing, and reduction, is a rather well-defined programming task. The design
and coordination of the algorithmically conventional syntactic processors involved
in the interpreter constitute the more interesting implementation problems, so that
aspect is discussed in this section.
20.1. Basic Structure of the Implementation
The goals of the interpreter implementation were to determine the practicality of
the novel aspects of an equational interpreter as a computing engine, and provide
the facility for preliminary experiments in the usefulness of the equational pro-
gramming language as a programming language. These two goals are in some
ways contrary to one another. Preliminary experiments in equational programming
could be performed well on a very naive interpreter that would execute small pro-
grams with a combined preprocessing and running time of a few seconds. In order
to test the practicality of evaluation strategies as sources of computing power, we
needed to provide better run-time performance than was needed for the program-
ming experiments, even at the cost of substantial preprocessing work. We decided
to emphasize the first goal, as long as the preprocessing time could be kept toler-
able for small programs. This decision led to a two-dimensional structure for the
interpreter, as shown in Figure 20.1.1. The vertical dimension shows the processing
of an equational program into tables suitable to drive a fast interpreter. The hor-
izontal dimension shows an input term being reduced to an output normal form by
the resulting interpreter.
Very early experience convinced us that even simple programming experiments
20.1. Basic Structure 263
equational
program
preprocessor
tables for
pattern matching
interpreter
input normal
term form
Figure 20.1.1
would be prohibitively difficult without a good syntax for terms. In particular, we
started out naively with the standard mathematical notation for terms (Standmath
of Section 4.1). Although this is fine for metanotation, when we used it to write
equations for a pure LISP interpreter, writing cons (A, cons (B, nil)) instead of the
special LISP notation (4 B), the result was very difficult to manage. Since pure
LISP is defined by a reasonably small and simple equational program, we decided
that the fault was in the notation. At the same time we realized that LISP nota-
tion is unacceptably clumsy for other problems, such as the lambda calculus. So,
we decided to separate parsing and other syntactic manipulations completely from
the semantic essentials of the interpreter, allowing for a library of different syn-
taxes to be chosen for different problems.
In this section the semantic essentials of the interpreter and preprocessor are
called the core programs, and the parsers and other syntactic transformers that
analyze the input and pretty-print the output are called syntactic shells. We con-
264 20. Implementation
centrated the implementation effort on the performance of the core programs, since
syntactic problems are already rather well understood. Design effort connected
with the shells went mostly toward flexibility of their interfaces, rather than the
internals of the parsers, etc.
We decided that it was important to be able to vary, not only the syntactic
forms seen by a user, but also the way in which terms are presented to the core
programs. The issue first arose regarding the efficiency of the pattern matcher -- in
some cases pattern-matching tables could be much smaller if each term of the form
f(A, B,C) were Curried into apply (apply (apply (f, A), B), C), reducing the
arity of symbols to a maximum of 2. In other cases, Currying could be wasteful,
and there are many other variations in the presentation of terms. In order to allow
flexibility in choosing an efficient internal form (the inner syntax of Section 4.4),
while guaranteeing functional equivalence at the user’s level, we separated the syn-
tactic shell into two levels, shown in Figure 20.1.2
We use the terminology of transformational grammars [Ch65] to discuss the
levels of syntactic processing, even though we do not use the transformational for-
malism to define that processing. The source text produced by a user as input, or
provided to a user as output, is called concrete syntax. The parsed form of con-
crete syntax, showing the tree structure inherent in its text, is called surface
abstract syntax -- from the surface structures of transformational grammars.
Essentially, the translations between concrete syntax and surface abstract syntax
are context-free, so a list of variables in the concrete syntax is still a list of vari-
ables, in the same order, in the surface abstract syntax. Surface abstract syntax is
transformed in non-context-free ways into deep abstract syntax. In particular,
declarations of symbols and variables are processed, and each occurrence of a sym-
20.1. Basic Structure 265
concrete
syntax
syntactic
transforme
syntactic shell (pre. /n)
| concrete
I syntax
semantic
preprocessor,
syntactic
transformer
YS”
syntactic shell (/at our)
programs
Figure 20.1.2
bol is marked with the information in its declaration. This sort of processing is
called "semantic analysis" in much of the literature on compiling, but we believe it
is more illuminating to think of it as another component of syntactic analysis.
Structural transformations, such as Currying, are also performed in the translation
from surface to deep structure. Future versions of the interpreter may use
transformations on abstract syntax to implement modular program constructors of
the sort discussed in Section 14. The uniformity of the representation of abstract
syntax at different levels allows both sorts of transformations, as well as the non-
266 20. Implementation
context-free syntactic processing, to be implemented by equational programs.
Communication between different portions of the equation interpreter system
is always done by files of characters. These files are implemented as pipelines
whenever possible. Except for the input produced by, and the output scen by, a
user, and the Pascal code produced by the preprocessor for inclusion in the inter-
preter, all communication follows a standard format for abstract symbolic informa-
tion, in which an abstract symbol is given by its length, type, and a descriptive
string of characters. Section 20.1 describes this abstract symbolic format. A
thorough understanding of symbol format is crucial to the sophisticated user who
wishes to produce his own syntactic processors.
20.2. A Format for Abstract Symbolic Information
Input to, and output from, the cores of the preprocessor and interpreter are always
presented in a versatile and abstract syntactic form, which is computationally
trivial to parse. Use of this special form is intended to remove all questions of syn-
tactic processing from the core programs, both in order to simplify and clarify
these programs, and to allow great flexibility in manipulating their syntactic shells.
Because the equation interpreter is used as part of its own syntactic processing, it is
easy to lose orientation when thinking about symbolic files. The key idea is that
the format of a symbolic file must always be appropriate to the equational program
for which it is the immediate input or output.
The details of the representation of abstract syntax described below, were
designed for ease of experimentation. Although intended for internal use in com-
putations, they are readable enough to be useful for debugging. As a result, the
representations are highly inefficient, taking more space in most cases than the con-
20.2. Format for Abstract Information 267
crete syntax. A more mature version of the equation interpreter will compress the
internal representations of abstract syntax, sacrificing readability for debugging
purposes in favor of efficiency.
A file of abstract symbols contains a contiguous sequence of abstract symbols,
with no separators and no terminator. Each abstract symbol is of the form
length type content
presented contiguously with no separators. Jength is given as a nonnegative integer
in normal base-10 notation, with no leading zeroes (except that zero itself is
presented as "0"). type is a single character, with a meaning described later. con-
tent is a sequence of characters, the number of characters being exactly the given
length. The idea for this representation was taken from FORTRAN’s FORMAT
statements.
There are 5 types of abstract symbol that are important to the equation inter-
preter system. The motivation for each of these types refers to the preprocessor or
interpreter core program for which the symbol is an immediate input or output. A
symbol that is presented to an interpreter has the same meaning as if it were
presented to the preprocessor that produced that interpreter.
M Metasymbol: a symbol with a special predetermined meaning to the system.
L Literal symbol: a symbol whose meaning is given by the user in his definitions.
A Atomic symbol: a symbol with no discernible structure or special meaning.
C Character string: a symbol intended to denote the very sequence of characters
that is its content.
I Integer symbol: a symbol intended to denote an integer number.
T Truth symbol: a symbol denoting a truth value.
268 20. Implementation
Examples of abstract symbols are "/M(", "SAabcde", "10Cdefinition", "ITT", "ITF",
"411111", "41-124". Informally, we will refer to abstract symbols by their contents
when the type and length are clear from context.
Metasymbols include left and right parentheses, "(" and ")", and codes for the
predefined classes of symbols, equations, and operations. Integer symbols are
presented in normal base-10 notation, with a single preceding minus sign, "-", for
negative integers. Truth symbols include "7" for truth and "F" for falsehood. The
other three types of symbol are very easily confused, and require some initial study.
Character strings are taken by the equation interpreter to which they are present-
ed, and by the equation interpreter produced by the preprocessor to which they are
presented, as textual data to be manipulated by concatenation, extraction of sub-
Strings, etc. The lexical relationships between different character strings may be
important to the interpreter program that manipulates them. Literal symbols,
presented to a preprocessor, are intended to be given meanings by the equations
that the preprocessor is reading; presented to an interpreter, they are intended to
have the meanings given by the equations from which that interpreter was pro-
duced. The lexical structure of literal symbols is irrelevant. Atomic symbols are
opaque symbols with no meanings beyond their identities. The lexical structure of
atomic symbols is irrelevant to an equational program that is processing them, but
may become relevant in a later step if some nonequational syntactic transforma-
tion, such as the content operation of Section 13.2, maps atomic symbols to some
other types of symbols.
Either an atomic symbol, or a character string, may have a content with a spe-
cial meaning to an earlier or later step in a sequence of programs. Thus, "4Acons"
is an atomic symbol, whose content is "cons". That content could be accessed by a
20.2. Format for Abstract Information 269
nonequational program at some point to produce the literal symbol "4Zcons". This
near-brush with confusion is necessary in order for an equational program to be
part of the syntactic processor for the equation preprocessor. Thus, equations
defining a syntactic processor may use literal symbols, such as "4Zcons", in order to
perform syntactic transformations on expressions containing the atomic symbol
"4Acons", which will later become the literal symbol "4Zcons" when presented to
the core of the preprocessor.
The metasymbols that are meaningful to the equation interpreter system are:
"(",")" used to present the tree-structure of a term as input to or output from the
preprocessor or the interpreter
The remaining metasymbols are used only in input to the preprocessor
"Vy" marks an address of a formal variable
"U" union of syntactic classes in variable qualification
"2" the universal syntactic class
"#" the empty syntactic class
"A" the class of atomic_symbols
"I" the class of integer_numerals
"C" the class of character_strings
"T" the class of truth_values
npn nen en 7", "m", "=", "<" the predefined functions, used on right-hand sides
Text containing the metasymbols "(" and ")" is intended to represent a term in the
natural way. Such text will often be displayed for informal discussion with com-
mas, spaces, and indentation to improve readability, although no such notation ap-
pears inside the machine.
270 20. Implementation
20.3. Syntactic Processors and Their Input/Output Forms
The main goal in the organization of the syntactic processors is versatility. In ad-
dition to the variations in the external concrete syntax typed and scen by the user,
described in Section 4, there are variations in the form in which material is
presented to the core programs. These variations in internal syntax are provided
because different encodings of terms may have radically different effects on the
efficiency of the pattern-matching algorithms in the interpreter. The reasons for
these effects are explained in Section 18.2, and the details of the different internal
syntaxes are given in Section 4.
Figure 20.3.1 refines Figure 20.1.2 to show all of the levels of syntactic
analysis. The configurations for the preprocessor input analyzer, pre.in, the inter-
preter input analyzer, int.in, and the interpreter output pretty-printer, int.out, are
essentially the same, except that pre.in has one more step than the others, and
int.out transforms from inner form into outer form -- the opposite direction from
pre.in and int.in. In describing the different levels of syntactic processing, we will
always refer to pre.in, which has the most complex forms. The forms for int.in
and int.out are merely the term portions of pre.in syntaxes.
Concrete syntax is defined in Sections 3, 4 and 8. Surface abstract syntax is
intended to represent the concrete syntax directly, with purely typographical con-
siderations removed. Surface abstract syntax is in the form:
sspec(ssymspec(list_of_symspecs), sequspec(list_of variables, list_of_equations))
"sspec", "ssymspec", "sequspec", and all other explicitly-given symbols, are literal
symbols. The lists are represented in the usual way by the literal symbols "cons"
20.3. Syntactic Processors 271
parser
AS
~
N
type &
assignment
®
<<
”
syntactic )
surface transformer] | <=
structure o
Bs
Cc
n
variable
resolution
deep deep surface
Structure | semantic structure structure
processor
semantic
interpreter
syntactic
syntactic
LJ transformer}
transformer
eo ae NG
syntactic shell (/nf in) shell (/nt in) syntactic shell (sat our)
Figure 20.3.1
272 20. Implementation
and "nil". Elements of the list_of_symspecs are of the two forms:
usersym(list_of symbols, arity)
predefsym(list_of_symbols)
Elements of the /ist_of_symbols and list_of_variables are of the forms:
litsym(atomic_symbol)
atomsym(atomic_symbol)
metasym(atomic_symbol)
The contents of these atomic_symbols will become literal symbols, atomic symbols,
and metasymbols respectively when they reach the core program. Arity is of the
form
intnum (atomic_symbol)
The contents of this atomic_symbol will become an integer. Elements of the
list_of_equations are of the two forms:
qualequ(term,term,list_of_quals)
predefequ(list_of symbols)
Each element of the list_of_quals is of the form:
qualify (list_of_variables,list_of_terms)
Terms within qualifiers may include the binary literal operator “qualterm" to intro-
duce nested qualifications in the form:
qualterm (term, list_of_quals)
Qualifiers may be nested.
20.3. Syntactic Processors 273
Because the syntactic processors will perform syntactic, rather than semantic,
manipulations on terms, each term is represented somewhat indirectly by an
abstract syntax showing its applicative structure, as described in Section 13.2.
Thus,
Sf (a,b,c,d)
is represented by
multiap[litsym[f]; (a 6 c d))
In general, the structure of a specification given by the keywords in concrete syntax
is translated directly into the structure of the surface abstract syntax term, shown
by its literal symbols. Tokens other than keywords in the surface concrete syntax,
including user-defined symbols, symbols intended to indicate predefined classes of
symbols or equations, integers, character strings, and atomic symbols, are all
translated into atomic symbols in the surface abstract syntax. All instances of in-
tegers, character strings, truth values, literals, and metasymbols are marked by the
literal operators "intnum", "charstr", "truthval", “litsym", and "metasym" as shown
above. Atomic symbols are not yet marked with "atomsym" because they have not
yet been distinguished from variables.
Deep abstract syntax is similar to surface abstract syntax, but information has
been organized in a form more convenient to the core program than is the surface
form. In particular, user-defined and predefined symbol declarations are separated,
arities and similar tags are distributed over lists of symbols, symbols are marked
appropriately as variables, atomic symbols, integer numerals, character strings, and
literals. Finally, all qualifications on variables replace the left-hand-side oc-
currences of variables that they qualify, and right-hand-side occurrences of vari-
274 20. Implementation
ables are shown as
varaddr (list_of_integers)
where the list_of_integers gives the sequence of tree branches to be followed to
find the corresponding left-hand-side occurrence of the variable, and equation sche-
mata are substituted for invocations of predefined classes of equations.
A specification in deep abstract syntax is of the form:
dspec(dsymspec(list_of_user_syms,
list_of_predef_syms
dequspec(list_of equations
)
Elements of the list_of_user_syms are of the form:
usersym (symbol arity)
Elements of the list_of_predef_syms are of the form:
predefsym (symbol)
Elements of the list_of_equations are of the form:
equate (term ,term)
As in the surface abstract syntax, symbols given explicitly above are literal sym-
bols, symbols taken from the surface concrete syntax are atomic symbols, and
terms are represented in syntactic form with the operator "multiap". Every atomic
symbol is now marked in one of the following forms to show its intended type:
metasym(symbol)
litsym(symbol)
20.3. Syntactic Processors 275
atomsym (symbol)
charstr (symbol)
truthval (symbol)
intnum (symbol)
In order to accommodate inner syntactic translations, such as Currying, the
transformation of surface abstract syntax to deep abstract syntax goes in three
steps.
1. Declarations of literal symbols and variables are processed, and each oc-
currence of a symbol is marked by the appropriate tag. A variable x is given
as qualvar|x; qualifications] if it is on the left-hand side of an equation, and
simply variable[x] if it is on the right-hand side.
2. Any inner syntactic transformations, such as Currying, are performed.
3. Right-hand-side variables are replaced by the corresponding left-hand-side ad-
dresses, and all variable names are eliminated.
Notice that this order of work is critical -- syntactic transformations may depend
on the types of symbols encountered, and variable addresses must be assigned
based on the transformed versions of the left-hand sides. Immediately before it is
presented to a core program, each term in deep abstract syntax is transformed by
the content operation of Section 13.2, to produce the semantically appropriate
terms without mediation by the multiap symbol.
Bibliography
AC75
AHU74
AHU83
AU72
AW76
AW77
At75
Av61
Ba74
Ba78
BB79
BS76
Aho, A. V. and Corasick, M. J. Efficient String Matching: An Aid to
Bibliographic Search, Communications of the ACM 18:6 (1975) pp. 333-
340.
Aho, A. V., Hopcroft, J. E., Ullman, J.D. The Design and Analysis of
Computer Algorithms, Addison-Wesley (1974).
Aho, A. V., Hopcroft, J. E., Ullman, J. D. Data Structures and Algo-
rithms, Addison-Wesley (1983).
Aho, A. V. and Ullman, J. D. The Theory of Parsing, Translation, and
Compiling, Prentice-Hall (1972).
Ashcroft, E. and Wadge, W. Lucid - A Formal System for Writing and
Proving Programs. SIAM Journal on Computing 5:3 (1976) pp.336-354.
Ashcroft, E. and Wadge, W. Lucid, a Nonprocedural Language with
Iteration, Communications of the ACM 20:7 (1977) pp. 519-526.
Atkins, D. E. Introduction to the Role of Redundancy in Computer
Arithmetic, IEEE Computer 8:6 (1975) pp. 74-76.
Avizienis, A. Signed-Digit Number Representations for Fast Parallel
Arithmetic, Institute of Radio Engineers Transactions on Electronic
Computers (1961) p. 389.
Backus, J. Programming Language Semantics and Closed Applicative
Languages. ACM Symposium on Principles of Programming Languages
(1974) pp. 71-86.
Backus, J. Can Programming Be Liberated from the von Neumann
Style? A Functional Style and its Algebra of Programs, Communications
of the ACM 21:8 (1978) pp. 613-641.
Bauer, F. L., Broy, M., Gnatz, R., Hesse, W., Krieg-Bruckner, B.,
Partsch, H., Pepper, P., Wossner, H. Towards a Wide Spectrum
Language to Support Program Development by Transformations. Pro-
gram Construction: International Summer School, Lecture Notes in
Computer Science v. 69, Springer-Verlag (1979) pp. 543-552; SIGPLAN
Notices 17:12 (1978) pp. 15-24.
Belnap, N. D. and Steel, T. B. The Logic of Questions and Answers,
Yale University Press (1976).
278
BL77
BL79
Bi67
Bj72°
Br69
Br79
Br76
BG77
CaJ72
Ca76
Ch80
Ch81
Ch65
Bibliography
Berry, G. and Lévy, J.-J. Minimal and Optimal Computations of Recur-
sive Programs. 4th ACM Symposium on Principles of Programming
Languages (1977) pp. 215-226.
Berry, G. and Lévy, J.-J. Letter to the Editor, SIGACT News v. 11, no. 1
(1979) pp. 3-4.
Bishop, E. Foundations of Constructive Analysis, McGraw-Hill, New
York (1967).
Bjorner, D. Finite State Tree Computations (Part I). IBM Research
Technical Report RJ 1053 (#17598) (1972).
Brainerd, W. S. Tree generating regular systems. Information and Con-
trol 14 (1969) pp. 217-231.
Bridges, D. S. Constructive Functional Analysis, Pitman, London (1979).
Bruynooghe, M., An Interpreter for Predicate Logic Programs Part I,
Report CW10, Applied Mathematics and Programming Division, Katho-
lieke Universiteit, Leuven, Belgium (1976).
Burstall, R. M. and Goguen, J. A. Putting Theories Together to Make
Specifications. 5th International Joint Conference on Artificial Intelli-
gence, Cambridge, Massachusetts (1977) pp. 1045-1058.
Cadiou, J., Recursive Definitions of Partial Functions and Their Compu-
pe Ph.D. Dissertation, Computer Science Dept., Stanford University
1972).
Cargill, T., Deterministic Operational Semantics for Lucid, Research
Report CS-76-19, University of Waterloo (1976).
Chew, L. P. An Improved Algorithm for Computing With Equations.
21st Annual Symposium on Foundations of Computer Science (1980) pp.
108-117.
Chew, L. P. Unique Normal Forms In Term Rewriting Systems With
Repeated Variables. 13th Annual ACM Symposium on Theory of Com-
puting, (1981) pp. 7-18.
Chomsky, N. Aspects of the Theory of Syntax, MIT Press, Cambridge
MA (1965).
Bibliography 279
Ch41
deB72
CG80
CF58
DS76
DST80
Di76
Fa77
FW76
Church, A. The Calculi of Lambda-Conversion. Princeton University
Press, Princeton, New Jersey, 1941.
de Bruijn, N. G. Lambda Calculus Notation with Nameless Dummies,
Nederl. Akad. Wetensch. Proc. Series A 75 (1972) pp. 381-392.
Clarke, T. J. W., Gladstone, P. J. S., MacLean, C. D., and Norman, A.
C. SKIM - The S, K, I Reduction Machine. 1980 LISP Conference,
Stanford University (1980) pp. 128-135.
Curry, H. B., and Feys, R., Combinatory Logic v. I. North-Holland,
Amsterdam, (1958).
Downey, P. J. and R. Sethi, Correct Computation Rules for Recursive
Languages. SIAM Journal on Computing 5:3 (1976) pp. 378-401.
Downey, P. J., Sethi, R., and Tarjan, R. E. Variations on the Common
Subexpression Problem. Journal of the ACM 27:4 (1980) pp. 758-771.
Dijkstra, E. W. A Discipline of Programming Prentice-Hall, Englewood
Cliffs, NJ (1976).
Farah, M., Correct Compilation of a Useful Subset of Lucid, Ph.D.
Leet anon. Department of Computer Science, University of Waterloo
1977).
Friedman, D., and Wise, D. Cons should not evaluate its arguments, 3rd
International Colloquium on Automata, Languages and Programming,
Edinburgh University Press (1976) pp. 257-284.
FGJM85 Futatsugi, K., Goguen, J. A., Jouannaud, J.-P., Meseguer, J. Principles
GJ79
GS78
GH78
of OBJ2, 12th Annual Symposium on Principles of Programming
Languages, New Orleans LA (1985).
Garey, M. R. and Johnson, D. S. Computers and Intractability -- a
ea to the Theory of NP-Completeness W. H. Freeman, New York
1 ‘
Guibas, L. and R. Sedgewick, A Dichromatic Framework for Balanced
ig 19th Symposium on Foundations of Computer Science (1978) pp.
Guttag, J. V. and Horning, J. J. The Algebraic Specification of Abstract
Data Types, Acta Informatica 10:1 (1978) pp. 1-26.
280
Bibliography
GHM76 Guttag, J., Horowitz, E., and Musser, D. Abstract Data Types and
Go77
Go84
HM76
Ho62
Ho78
HO79
HO82a
HO82b
HO83
HD83
HL79
HO80
Software Validation, Information Science Research Report ISI/RR-76-
48, University of Southern California, (1976).
Goguen, J. A. Abstract Errors for Abstract Data Types, JFIP Working
Conference on Formal Description of Programming Concepts, E. J. Neu-
hold, ed., North-Holland (1977).
Goguen, J. A. Parameterized Programming, JEEE Transactions on
Software Engineering 10:5 (1984) pp. 528-544.
Henderson, P., and Morris, J. H. A Lazy Evaluator, 3rd ACM Sympo-
sium on Principles of Programming Languages (1976) pp. 95-103.
Hoare, C. A. R. Quicksort, Computer Journal 5:1 (1962) pp. 10-15.
Hoffmann, C., Design and Correctness of a Compiler for a Nonprocedural
Language, Acta Informatica 9 (1978) pp. 217-241.
Hoffmann, C. and O’Donnell, M. Interpreter Generation Using Tree Pat-
tern Matching, 6th Annual Symposium on Principles of Programming
Languages (1979) pp. 169-179.
Hoffmann, C. and O’Donnell, M. Pattern Matching in Trees, Journal of
the ACM (1982) pp. 68-95.
Hoffmann, C. and O’Donnell, M. Programming With Equations, ACM
Transactions on Programming Languages and Systems (1982) pp. 83-
112.
Hoffmann, C. and O’Donnell, M. Implementation of an Interpreter for
Abstract Equations, 0th Annual ACM Symposium on Principles of Pro-
gramming Languages (1984) pp. 111-120.
Hsiang, J. and Dershowitz, N. Rewrite Methods for Clausal and Non-
Clausal Theorem Proving, 10th EATCS International Colloquium on
Automata, Languages, and Programming, Spain (1983).
Huet, G. and Lévy, J.-J. Computations in Non-ambiguous Linear Term
Rewriting Systems, IRIA Technical Report #359 (1979).
Huet, G. and Oppen, D. Equations and Rewrite Rules: a Survey, For-
mal Languages: Perspectives and Open Problems, R. Book, ed.,
Academic Press (1980).
Bibliography 281
Hu52
Irél
Jo78
Jo77
KM77
KP78
KM66
K180a
K180b
Kn68
KMP77
KB70
Ko79a
Huffman, D. A. A Method for the Construction of Minimum-
Redundancy Codes, Proceedings of the Institute of Radio Engineers 40
(1952) pp. 1098-1101.
Irons, E. T. A Syntax Directed Compiler for ALGOL 60, Communica-
tions of the ACM 4:1 (1961) pp. 51-55.
Johnson, S. C. Yacc: Yet Another Compiler Compiler, In UNIX Time-
Sharing System: UNIX Programmer's Manual, Volume 2A Bell Tele-
phone Laboratories.
Johnson, S. D. An_ Interpretive Model for a Language Based on
Suspended Construction, Technical Report #68, Dept. of Computer Sci-
ence, Indiana University (1977).
Kahn, G. and MacQueen, D. B. Coroutines and Networks of Parallel
Processes, Information Processing 77, B. Gilchrist ed., North-Holland
(1977) pp. 993-998.
Kahn, G. and Plotkin, G. Domaines Concrets, Tech. Rep. 336, IRIA
Laboria, LeChesnay, France (1978).
Karp, R. M. and Miller, R. E. Properties of a Model for Parallel Com-
putations: Determinacy, Termination, Queueing, SIAM Journal on
Applied Mathematics 14:6 (1966) pp. 1390-1141.
Klop, J. W. Combinatory Reduction Systems, Ph. D. dissertation,
Mathematisch Centrum, Amsterdam (1980).
Klop, J. W. Reduction Cycles in Combinatory Logic. To H. B. Curry,
Seldin and Hindley, eds., Academic Press (1980) pp. 193-214.
Knuth, D. E. Semantics of Context-Free Languages, Mathematical Sys-
tems Theory 2:2 (1968) pp. 127-146.
Knuth, D. E., Morris, J., Pratt, V. Fast Pattern Matching in Strings,
SIAM Journal on Computing 6:2 (1977) pp. 323-350.
Knuth, D. E. and Bendix, P. Simple Word Problems in Universal Alge-
bras. Computational Problems in Abstract Algebra, J. Leech, ed., Per-
gammon Press, Oxford (1970) pp. 263-297.
Kowalski, R. Algorithm = Logic + Control. Communications of the
ACM 22:7 (1979) pp. 424-436.
282
Ko79b
La65
Le83
Le68
McC60
McC62
McI68
MvD82
My72
NO78
NO80
O’D77
O’D79
OS84
Bibliography
Kowalski, R. Logic for Problem Solving, Elsevier North-Holland, New
York (1979).
Landin, P. J. A Correspondence Between ALGOL 60 and Church’s
Lambda-Notation: Part I. Communications of the ACM 8:2 (1965) pp.
89-101.
Lescanne, P. Computer Experiments with the REVE Term Rewriting
System Generator, 10th Annual Symposium on Principles of Program-
ming Languages, Austin TX (1983).
Lewis, P. M. II and Stearns, R. E. Syntax-Directed Transduction. Jour-
nal of the ACM 15:3 (1968) pp. 465-488.
McCarthy, J., Recursive Functions of Symbolic Expressions and Their
Computation by Machine, Communications of the ACM 3:4 (1960) pp.
184-195.
McCarthy, J. Towards a Mathematical Science of Computation. [FIP
Munich Conference 1962, North-Holland, Amsterdam (1963).
Mcllroy, M. D., Coroutines, Internal report, Bell Telephone Laboratories,
Murray Hill, New Jersey (1968).
Meyerowitz, N. and van Dam, A. Interactive Editing Systems: part II,
ACM Computing Surveys 14:3 (1982) pp. 353-415.
Myhill, J. What is a Real Number? American Mathematical Monthly
79:7 (1972) pp. 748-754.
Nelson, G. and Oppen, D. C. A Simplifier Based on Efficient Decision
Algorithms, 5th Annual ACM Symposium on Principles of Programming
Languages (1978) pp. 141-150.
Nelson, G. and Oppen, D. C. Fast Decision Algorithms Based on
Congruence Closure, Journal of the ACM 27:2 (1980) pp. 356-364.
O'Donnell, M., Computing in systems Described by Equations, Lecture
Notes in Computer Science v. 58, Springer-Verlag (977).
O'Donnell, M. J. Letter to the Editor, SIGACT News, 11:2 (1979) p. 2.
O’Donnell, M. J. and Strandh, R. I. Toward a Fully Parallel Implemen-
tation of the Lambda Calculus, Technical Report JHU/EECS-84/13, The
Johns Hopkins University (1984).
Bibliography 283
OI79
RoG77
Ro65
Ro79
Re84
RTD83
Ro73
Sc24
St77
St79
St72
ST84
Th73
Owens, R. M. and Irwin, M. F. On-Line Algorithms for the Design of
Pipeline Architectures, Annual Symposium on Computer Architecture
Philadelphia (1979) pp. 12-19.
Roberts, G., An Implementation of Prolog, M.S. Thesis, Dept. of Com-
puter Science, University of Waterloo (1977).
Robinson, J. A. A Machine-Oriented Logic Based on the Resolution
Principle, Journal of the ACM 12:1 (1965) pp. 23-41.
Robinson, J. A. Logic, Form and Function: the Mechanization of
Deductive Reasoning. Elsevier North-Holland, New York (1979).
Reps, T. Generating Language-Based Environments, MIT Press, Cam-
bridge, MA (1984).
Reps, T., Teitelbaum, T., and Demers, A. Incremental Context-
Dependent Analysis for Language-Based Editors, ACM Transactions on
Programming Languages and Systems 5:3 (1983) pp. 449-477.
Rosen, B. K. Tree Manipulation Systems and Church-Rosser Theorems,
Journal of the ACM 20:1 (1973) pp. 160-187.
Schénfinkel, M. Uber die Bausteine der Mathematischen Logik, Math.
Ann. 92 (1924) pp. 305-316.
Staples, J. A Class of Replacement Systems with Simple Optimality
Theory, Bulletin of the Australian Mathematical Society, 17:3 (1977) pp.
335-350.
Staples, J. A Graph-Like Lambda Calculus For Which Leftmost-
Outermost Reduction Is Optimal. Graph Grammars and Their Applica-
tion to Computer Science and Biology, Lecture Notes in Computer Sci-
Gene 73, V. Claus, H. Ehrig, G. Rosenberg eds., Springer-Verlag
1 :
Stenlund, S. Combinators, Lambda-Terms, and Proof Theory. D.
Reidel Publishing Company, Dordrecht, Holland (1972).
Strandh, R. I. Incremental Suffix Trees with Multiple Subject Strings,
ia Report JHU/EECS-84/18, the Johns Hopkins University
1984).
Thatcher, J. W. Tree Automata: An Informal Survey. Chapter 4 of
Currents in the Theory of Computing, A. V. Aho, editor, Prentice-Hall,
Englewood Cliffs NJ (1973) pp. 143-172. .
284
Th85
Tu79
Vu74
WA85
Wa76
Wal77
Wi83
Bibliography
Thatte, S. On the Correspondence Between Two Classes of Reduction
Systems, to appear in Information Processing Letters (1985).
Turner, D. A. A New Implementation Technique For Applicative
Languages, Software - Practice and Experience, v. 9 (1979) pp. 31-49.
Vuillemin, J. Correct and Optimal Implementations of Recursion in a
Simple Programming Language, Journal of Computing and Systems Sci-
ence, 9:3 (1974) pp. 332-354.
Wadge, W. W. and Ashcroft, E. A. Lucid, the Dataflow Programming
Language Academic Press, London (1985).
Wand, M. First Order Identities as a Defining Language, Acta Informa-
tica v. 14 (1976) pp. 337-357.
Warren, D., Implementing Prolog, Research Reports #39, 40, Dept. of
Artificial Intelligence, University of Edinburgh (1977).
Winograd, T. Language as a Cognitive Process, Addison-Wesley, Read-
ing, MA (1983).
Index
A (atomic symbol) 267-268
A (combinator) 227
abstract alphabet 103
abstract data types 4
abstract symbol 267-268
abstract syntax 99,100,264-265; (representation of _) 266-269,270-275
add 25
addint 25
addition of integers, arbitrary precision 41-46
addition of reals, infinite precision 46-51
addition table 39-41
addition of polynomials 51-54
address of variable in expression 72-73,269,274
Aho 31,33,98,103,106,146,149,192,193,205,215
algorithms for equational computing 187-219
alpha conversion 55-56
alphabet, abstract vs. concrete 103
ambiguous notational specification 106
applications of equational computing 4-5
arbitrary precision integer operations 41-46
are (keyword) 27-28
arithmetic 24-25; (arbitrary precision _) 41-46
arrays 157-161
Ashcroft 62,139
assertions, programming with 1
associativity, 34
Atkins 46
atomic symbol 267-268
atomic symbols 22-23,269; (functions on _) 24-26;
(representing polynomial variables) 52
atomsym 112,272-273,275
attribute grammar 98,106,144-145
auxiliary symbols in notational specifications 102,103,108-110
Avizienis 46
back-pointers 189-191
Backus 132,220
Belnap 3
Bendix 4,84
Berry 75
beta reduction 55-62
Bishop 46
BNF for equational programs 11-12
bottom-up pattern matching 194-199
bound function 237
branching function 223-224
Bridges 46
Burstall 11,124,126,127,130
286
C (character string) 267-268
canonical encoding 238,250
Cargill 63
char 26,113,275
character string 267-268
characters 22,25,269
charint 26
Chew 151,217,218,261
Chomsky 99,264
Church 54,55,222,224,228,256
Church-Rosser property 135-136,224
circular list of son and fathers 190-191
Clarke 222,226
classes of equations in an equational program 11,24-26
classes of symbols in an equational program 10,22-23
Combinator Calculus 54-55,221-222,226
comments in an equational program 9
commutativity 34
compiler for equations 6
complete notational specification 106
complete reduction sequence 199,212
computable functions 221
computation relation 236
computational world 125
computing scenario 1,3
concrete alphabet 103
concrete syntax 264
concurrency 132-134, (dataflow) 137-145
conditional, simulation by equations 84-87
conditional OR 244
confluence 135-136,224,261
congruence closure 150,217-218
cons 14
consistent rule schemata 179
constants in an equational program 10,22
constructive real numbers 46
constructor symbols 78-84
content 115,275
content of abstract symbol 267,268-269
context 184
context-free grammar, improved 100-112; (decomposing a _) 106-110
context-free notational specification 104,106
context-free syntactic processing 264-265
Corasick 192,193,205
core programs 263-264,266
counter for pattern-matching 201
Curry 54,100,222,226,228,231,233,234
Curry inner syntax in an equational program 18
Currying 18,82-84,264
Index
Index
D (combinator) 226-227
data structures 151-176; (_ for equational computing) 187-219
dataflow 137-145
de Bruijn 55,56
de Bruijn notation for lambda terms 17, 56-58
decoding function 236
deep abstract syntax 264-265,272-275
deep structure 99,264-265
defined symbol in an equational program 10
defined symbols vs. constructors 78-84
definite potential redex 181
definitions 6-7,9
def deep 6
def.in 6
Demers 144
denoted 104
derive 130
dequspec 274
diagnostic aids for equational programming 71-74
Dijkstra 134,135
directed congruence closure 217-218
disjunction in variable qualification 28,199,204-205,209-210
displaying equations graphically 71-73
divide 25
divint 25
Downey 218
dspec 274
dsymspec 274
dummy nodes in reductions 214
dynamic programming 4,145-150,216,217
effective reduction system 223-224
effective simulation 237
ei (equation interpreter) 6-8,74
either (keyword) 27-28
el (equation lexicon) 71
empty term 104
encoder, semantic 249-250; (parallel _) 254-255
encoder, syntactic 248-249
encoding function 250-251; (parallel _) 255-256
encoding set 236
endwhere (keyword) 27-28
enrich 127
ep (equation preprocessor) 6-8
equ 25
equality test 261
equate 274
equation in an equational program 10
equation classes in an equational program 11,24-26
287
288 Index
Equations (keyword) 9
equatom 25
equchar 25
equint 25
Eratosthenes’ sieve 139-142
error, arithmetic 25
errors in an equational program 68-70
error processing by equational programs 87-90
es (equation display) 71-73
essential redex 184
et (equation trace) 73-74
evaluation operator 62
exact real arithmetic, 46-51
exceptional cases in the equation interpreter 68-70
exceptional conditions, processing by equational programs 87-90
failures of the equation interpreter 68-70
false 22
Feys 54,100,222,226,228,231,233,234
finite automata 192-193,205
fixpoint function 228-229
flat type 100
flattened pattern-matching 205-212
For all (keyword) 9
formal object 100
format for abstract symbols 266-269
Friedman 2,4,231
functions, predefined 24-26,269; (extensions of _ to arbitrary precision) 41-46
Futatsugi 4,215
garbage collection 219
Garey 210
Gladstone 222,226
Goguen 4,11,124,126,127,130,215
Golick 77 ‘
grammar (attribute _) 98,144-145; (transformational _) 98-99
graph of a function 145-146
graph representing a tree 188
graphic display of equations 71-73
guarded commands 134-135
guest 236
Guibas 169,173,176
Guttag 4,80
hashed sharing 215-217
Henderson 2,4,231
hiding symbols 129-131
Index
history of the equation interpreter 75-77
Hoare 33
Hoffmann 3,63,75-77,115,161,193,194,197,199,201
homomorphic encodings of terms 103
Hopcroft 31,33,146,149,215
Horner rule polynomial 51-52
Horning 4,80
host 236
Huet 4,75,177,179,181,182,183,199,232
Huffman 31
Huffman code 31-33
I (integer symbol) 267-268
I (combinator) 227
identifier in an equational program 10
if, parallel 261
implementation 262-275; (algorithms and data structures in the ) 187-219
incomparable subpatterns 197
incremental input/output 142-145
incremental pattern-matching 193,195-197,201,202
incremental preprocessing of equations 210
indeterminacy 134-137,227,247
index 181
indexing of partial recursive functions 228
indirection 214
infinite data structures 39-41,46-51,147,154,160
infinite precision reals 46-51
inner syntaxes in an equational program 17-19,264,275
instance 184
int.in 6-8,13,270
int.out 6-8,13,270
intchar 26
integer symbol 267-268
integer_numerals 22,269; (functions on _) 25
interpretation 104-105
interpreter, use of 6-8
interpreter 6-7
intersection of models 127
intnum 113,272,275
intractable problems 209,210
Irons 103
irrelevant computation steps 3
Irwin 46
is (keyword) 27-28
Johnson, D.S. 210
Johnson, S.C. 69
Jouannaud 4,215
289
290 Index
K (combinator) 226
Kahn 139,181
Karp 137
keywords of the equational programming language 9
Klop 59,179,228,231,259,261
Knuth 4,84,98,192
Kowalski 2,192
L (literal symbol) 267-268
Lambda syntax in an equational program 15-17
lambda abstraction 227-228
lambda calculus 55-62,256-260
lambda-term 228,256
Landin 99,100
lazy evaluation 2,4,62,134,231,239
least fixpoint function 228-229
left context 184
left linear rule schema 178
left-traversal context 184
leftmost-outermost reduction 186,222
left-sequentiality 20-21,183-186; (_ with qualifications on variables) 28-29;
(repairing failures of _) 94-97; (strong _) 183-186,207
Lescanne 4
less 25
lessint 25
level of the equation interpreter 124
Lévy 75,177,179, 181,182,183,199,232
Lewis 103
lexical structure, relevance of 268
lexicon for equational program 71
libraries of equational programs 124,151
linear rule schema 178
LISP 4,14,30,62,151-153; (simulation of _ conditionals) 84-87
LISP.M syntax in an equational program 13,14
lists 151-157; (_ represented by combinators) 229-231
list, infinite 39-41,46-51,147,154
list reversal 30-31
literal symbol in an equational program 10,13,23,267-268
litsym 113,272,274
loadsyntax 6-7,13
locking nodes (_ to prevent overlap) 166-173; (_ in nonsequential programs) 211
logical consequence 1,2,3
Lucid 62-67,139
M (metasymbol) 267-269
M-expression 14
MacLean 222,226
MacQueen 139
Index 291
matching state (set) 194,198-199,204
matrix multiplication, optimal 146-150
McCarthy 14,75,99,100,220
Mcllroy 139
meanings of equational programs 124-126
Meseguer 4,215
messages, error and failure 68-70
metanotation for terms 112-115
metasym 272,274
metasymbol 267-269
Meyerowitz 111
Miller 137
model 125
modification
modint 25
modular equational programs 124-131
modulo 25
Morris 2,4,192,231
multiap 113,272,274
multidimensional arrays 159
multint 25
multiplication of integers, arbitrary precision 41-46
multiply 25
Myhill 46
Nelson 151,217,218
nested qualifications on variables 28
nil 14
nonambiguous notational specification 106
non-context-free syntactic processing 5,115-123,264-266
non-context-free notational specifications 110-111
nondeterminism 134-137,260; (_ vs. parallelism) 238-239,247
nonoverlapping equations 20,251; (_ with qualifications on variables) 28-29;
(_ achieved by the constructor discipline) 78-84; (repairing overlaps) 90-94
nonoverlapping rule schemata 179
nonsequential equations 20,209,210-212; (repairing _) 94-97
normal forms 3,125,223,224; (uniqueness of _) 126,128,135,224; (total _) 181
Norman 222,226
notational specification 102-112; (simple or context-free _) 104,106
OBJ 4
ob 100
O’Donnell 2,3,60,63,75-77,161,177,179,193,194,197,199,201,212,214,231,232,259
Oppen 4,151,217,218
optimal matrix multiplication 146-150
optimization of equational programs 214-215
or (keyword) 27-28
OR, parallel 66-67,222,260; (positive _) 261
292 Index
OR, conditional 244
outermost evaluation 62,134,142,258
output set 125
overflow 24,25
overlap in equations 20; (_ with qualifications on variables) 28-29;
(_ avoided by the constructor discipline) 78-84; (eliminating _) 90-94
overwrite
Owens 46
pairing function 229-230
parallel encoder 255-256
parallel if 261
parallel OR 66-67,222,260; (positive _) 261
parallel semantic encoder 254-255
parallelism vs. nondeterminism 238-239,247
parsing errors in the equation interpreter 69
partial recursive function 228
partial redex 181,248,250
pattern-matching 187,191-210
pipeline, connecting equational programs 7
pipeline, simulated by equations 4
Plotkin 181
pointers in representations of expressions 187-191
polynomial addition 51-54
positive parallel OR 261
possibility sets 198-199,202-204
potential redex 182; (definite _) 181
power series form for polynomials 54
Pratt 192
precision of integers, arbitrary 41-46
precision of reals, infinite 46-51
predefeq 272
predefined symbols 22-23,269
predefined equations 24-26; (extensions of _ for arbitrary precision) 41-46
predefsym 272,274
preprocessor, use of 6-8
pre.in 6-8,13,270
prime number sieve 139-142
primitive operations 221
programming techniques (low-level) 78-97; (high-level) 132-150;
(data structures) 151-176
projection function 229-230
Prolog 2,3,191-192
property A 233,234,260
proving theorems 34-39
qualequ 272
qualifications on variables 27-29,199,204-205,209-210
Index 293
qualify 272
qualterm 272
qualvar 275
question-and-answer computing 1,3
Quicksort 33-34
random-access machine 221
real arithmetic, exact 46-51
reclaiming storage 219
recursive definitions 228-229
recursive separation 260-261
redex 181,232,248,250
reduction 223
reduction, implementation 187,212-219
reduction system 223-226
reference count 219
regular term reduction systems 80,179,252-256;
(conversion of _ to constructor discipline) 80-84
regular tree languages 101
relevant computation steps 3
repeated variables 20; (_ with qualifications on variables) 28-29
representing expressions 187-191
Reps 111,144
residual 232
restrictions on equations 20,78; (_ with qualifications on variables) 28-29
restrictions on variables 27-29,199,204-205,209-210
restrictions on symbols 74
REVE 4
rewriting rules 3,177-178
root index, strong 184
root stable 181; (strongly _) 182
root-essential 184
Rosen 177
Rosser 224
rule schema 177-178
S (combinator) 226
SK calculus 226,244-247,248-252,261
S—K-—A calculus 227,244-247
S—K—C calculus 261
S—K-—D calculus 226-227,244-247,252-256
S—K-—E calculus 261
S—K-—F calculus 261
S—K-—P calculus 261
Sacco 76
satisfies 125
scenario, computing 1,3
Schonfinkel 82,226
294 Index
search trees 161-176
Sedgewick 169,173,176
selecting redexes 187,193,198-199,202-204,210-212
semantic encoder 249-250; (parallel _) 254-255
semantic errors and failures 70
semantics of programming languages 1,2,265
seqno 26
sequencing function 183
sequencing reductions 187,193,198-199,202-204,210-212
sequential equations 20-21; (_ with qualifications on variables) 28-29;
(repairing failures of sequentiality) 94-97
sequential reduction system 180-182
sequentiality 180-183,232
set, matching 194,198-199,204
set, possibility 198-199,202-204
Sethi 218
sharing of equivalent subtrees 215-218
shells, syntactic 263-264
sieve of Eratosthenes 139-142
simple notational specification 104,106
simply strongly sequential 183,248-252
simulation of reduction systems 235-244
sorting 33-34; (relation of _ to commutativity and associativity) 34-35
space limits in the equation interpreter 8
sparse arrays 160,161
sequspec 270
sspec 270
ssymspec 270
stable, root 181
stable symbol 143,198-199,248,250
standard reduction sequence 233-234
Standmath inner syntax in an equational program 18
Standmath syntax in an equational program 13
Staples 59
State of reduction system 223
State, matching 194
Stearns 103
Steel 3
Stenlund 54,82,222,226,227,228,231,258
stepwise simulation 221
storage management 219
Strandh 60,210,259
string language 105
strong index 182
strong reduction 226
strong root index 184
strongly left-sequential 183-186,207
Strongly root stable 182
strongly sequential 182,261
structure editors 111-112
Index 295
structured equational programming 124-131
subint 25
subtract 25
subtree replacement system 177
sufficient completeness 80
sum of computational worlds 127
surface abstract syntax 264,270-273
surface structure 99,264
symbol, abstract 267-268
symbol, defined, in an equational program 9-10
symbol classes in an equational program 10,22-23
Symbols (keyword) 9
syntactic encoder 248-249
syntactic errors and failures 69-70
syntactic processing, non-context-free 5,264-266
syntactic processing by equations 98-123
syntactic processors 270-275; (user-defined _) 58
syntactic shells 263-264
syntactic transform of worlds 129-131
syntax 114
syntax, abstract 99,100,264-265,270-275
syntax, concrete 264
syntax, represented by terms 112-115
syntax-directed translation schema 103,105-106
syntaxes for terms in an equational program 13-19
S-expression 14
T (truth symbol) 267-268
t (trace option to ei) 73-74
table 161-176; (infinite _) 39-41,147
Tarjan 218
Teitelbaum 144
term reduction (rewriting) system 177-178
terms in an equational program 13-19
terms representing terms 112-115
Thatcher 101
Thatte 80,94
theorem proving 4,34-39
top-down pattern-matching 199-205
total normal form 181
trace output from equational programs 73-74
trace.inter 74
transform, syntactic 129-131
transformational grammar 98-99,264
transition function 224
translations from terms to strings 105
translation in/out of metanotation 114-115
tree 177
tree isomorphism 215
296
tree language 100
trie 192-193
true 22
truth symbol 267-268
truthsym 113
truthval,275
universal sequential machine language 221
universal reduction system 244,246
universe 125
UNIX 6-8
unlimited precision operations on integers 41-46
unparsers 106,111-112
unstable symbol 143
usersym 272,274
van Dam 111
varaddr 274
variable 275
variables in equational programs 9-10;
(qualifications on _) 27-29,199,204-405,209-210
variables in lambda calculus 256-257
variables in polynomials as atomic symbols 52
variable addresses 72-73,269,274
Wadge 62,139
Wand 4
wasted computation steps 3
weak reduction 227
weak simulation 236-237,239-240
Wise 2,4,231
where (keyword) 27-29
workspace 8
world, computational 125
Index
The MIT Press, with Peter Denning as consulting editor, publishes computer sci-
ence books in the following series:
ACM Doctoral Dissertation Award and Distinguished Dissertation Series
Artificial Intelligence, Patrick Winston and Michael Brady, editors
Charles Babbage Institute Reprint Series for the History of Computing, Martin
Campbell-Kelly, editor
Computer Systems, Herb Schwetman, editor
Foundations of Computing, Michael Garey, editor
History of Computing, I. Bernard Cohen and William Aspray, editors
Information Systems, Michael Lesk, editor
The MIT Electrical Engineering and Computer Science Series
Scientific Computation, Dennis Gannon, editor
For information on submission of manuscripts for publication, please call or write
to:
Frank P. Satlow
Executive Editor
The MIT Press
28 Carleton Street
Cambridge, MA 02142
617/253-1623